domains or not domains? shuoyong shi, indraneel majumdar and nick v. grishin howard hughes medical...

16

Upload: charla-richard

Post on 13-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of
Page 2: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Domains or not domains?

ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin

Howard Hughes Medical Institute, Department of Biochemistry,

University of Texas Southwestern

Medical Center at Dallas

http://prodata.swmed.edu/CASP8

Page 3: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Traditionally, CASP targets are evaluated as domains, i.e. each target structure is parsed into domains, and model quality is computed for each

domain separately. This strategy makes sense for two reasons:

Domains can be mobile and their relative packing can be influenced by ligand presence, crystal packing for X-ray structures, or be semi-random in NMR structures. Thus even a perfect prediction algorithm will not be able to cope with this adequately, for instance in the absence of knowledge about the ligand presence or crystal symmetry.

Predictions may be better or worse for individual domains than for their assembly. This happens when domains are of a different predictability, e.g. one has a close template, but the other one does not. Even if domains of a target are of equal prediction difficulty, it is possible that the mutual domain arrangement in the target structure, while predictable in principle, differs from the template, and thus is modeled incorrectly by predictors.

Comparison of the whole-chain evaluation with the domain-based evaluation dissects the problem of 'individual domain' vs. 'domain

assembly' modeling and should help in development of prediction methods.

Why domains?

Page 4: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Evolutionary domains: correspond to structurally compact evolutionary modules.

How domains?

Ago protein:T0487

consist of 5 domains

http://prodata.swmed.edu/CASP8/evaluation/DomainDefinition.htm

Page 5: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Do we need domains?

122 targets, 176 evolutionary domains, do we need that many?

Server predictions helps us to reduce the number of domains:

if whole chain prediction quality is not much different from domain prediction quality, domain evaluation is not necessary.

GDT-TS(whole chain) VS.

Σi=1

Number of domains

Σi=1

Number of domains

Length(domain i) * GDT-TS(domain i)

Length(domain i)

http://prodata.swmed.edu/CASP8/evaluation/Domains.htm

Page 6: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Correlation between weighted by the number of residues sum of GDT-TS scores for domain-based evaluation (y, vertical axis) and whole chain GDT-TS (x, horizontal axis).

T0490: correlation between whole chain and domain predictions

Page 7: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Each point represents first server model. Green, gray and black points are top 10, bottom 25% and the rest of prediction models. Blue line is the best-fit slope line (intersection 0) to the top 10 server models. Red line is the diagonal.

Two parameters to describe correlation between whole chain and domain predictions

1. The root mean square (RMS) difference between the weighted sum of GDT_TS on domains and GDT_TS on the whole chain (RMS of y−x) measures absolute GDT-TS difference.

2. A slope of best-fit line with intercept set to 0 (slope) measures relative GDT-TS difference.

These parameters are computed on top 10 (according to the weighted sum) predictions

Page 8: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Correlation between weighted by the number of residues sum of GDT-TS scores for domain-based evaluation (y, vertical axis) and whole chain GDT-TS (x, horizontal axis).

T0504 needs domain evaluation

Page 9: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Correlation between weighted by the number of residues sum of GDT-TS scores for domain-based evaluation (y, vertical axis) and whole chain GDT-TS (x, horizontal axis).

T0447 does not need domain evaluation

Page 10: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

                                                                                     Ribbon diagram of 459: 3df8 chain A (rainbow) with its symmetry mate (white).

Domain swaps!                                                                                      5 out of 122 targets (4% !!!!) exhibit domain swaps, e.g.

Page 11: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Ribbon diagram of 459: 3df8 chain A with a swapped N-terminal β-hairpin from its symmetry mate chain (rainbow) and the swapped hairpin symmetry mate chain (white).

Swapped domain in T0459

Page 12: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Correlation plots for the two domain definitions (swapped and swapped segment removed) of this single-domain target reveal differences

whole chain 459: 3df8 chain ADomain-swapped 459: 3df8

chain B*:-2-22 plus chain A:23-106459 with domain-swapped segment removed: 3df8 chain A:23-106

Page 13: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

All targets: Correlation between RMS of the difference between GDT_TS on domains and GDT_TS on the whole chain (vertical axis) and the slope of the best-fit line (horizontal axis), both computed on top 10 server predictions.

Page 14: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

All targets: Correlation between RMS of the difference between GDT_TS on domains and GDT_TS on the whole chain (vertical axis) and the slope of the best-fit line (horizontal axis), both computed on top 10 server predictions.

Page 15: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Summary:

Comparison of domain-based predictions with whole chain predictions revealed a natural, data-dictated cutoff (slope of the zero intercept best-fit line is above 1.3) to select targets that require domain-based evaluation. These 17 targets are:

T0397, T0405, T0407, T0409, T0416, T0419, T0429, T0443, T0457, T0462, T0472, T0478,

T0487, T0496, T0501, T0504, T0510.

Predictions for other targets follow the general trend, are of a more similar quality for 'domain' and 'whole chain' and thus domain-based evaluation may not be necessary for them.

Page 16: Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of

Acknowledgement

Our group Collaborators

HHMI, NIH, UTSW,The Welch Foundation

Shuoyong Shi Jing TongRuslan Sadreyev Lisa KinchJimin Pei Ming TangSasha Safronova Yuan QiHua Cheng Jamie WrablIndraneel Majumdar Erik NelsonYong Wang S. Sri KrishnaBong-Hyun Kim Dorothee Staber

David Baker U. WashingtonKimmen Sjölander UC BerkeleyWilliam Noble U. Washington