do not reproduce without permission 1 gerstein.info/talks (c) 2003 permissions statement this...

66
1 Gerstein.info/talks (c) 2003 Do not reproduce without permission Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2005. Feel free to use images in it with PROPER acknowledgement (via citation to relevant papers or link to gersteinlab.org).

Upload: angel-blake

Post on 04-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

1 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

Permissions Statement

This Presentation is copyright Mark Gerstein, Yale University, 2005.

Feel free to use images in it with

PROPER acknowledgement

(via citation to relevant papers or link to gersteinlab.org).

2 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

Computational Proteomics: Networks & Structures

Mark B GersteinYale (Comp. Bio. & Bioinformatics)

Ottawa Health Research Institute

2006.10.23, 14:30 – 15:30

3 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

Omics Research at GersteinLab.org

• Human Genome Analysis (pseudogenes) Finding genes, characterizing the function of intergenic regions, and

analyzing protein fossils (pseudogenes)

• Eukaryotic Proteome Analysis (networks) Using molecular networks to integrate functional genomics

information and describe protein function on a genomic scale

• Structural Genomics (macromolecular motions) Analyzing select populations of 3D-structures in detail, trying to

understand their flexibility in terms of packing

4 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

OutlineComputational Proteomics:

Networks & Structures

• 3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution Interaction Networks and their

properties A 3-D structural point of view Network properties revisited TopNet Website

• Surveying Structural Motions in a DB Framework Motions DB based on Simple

Models for Protein Flexibility Detailed Classification based on

Interface Packing

• Hinge & Shear

• Packing Tools Comprehensive Statistics on

Flexibility over all Structures

• [ Distributions ]

• Hinge Survey

5 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

TopNet – an automated web tool

[Yu et al., 2004; Yip et al. (2005); Similar tools include Cytoscape.org, Idekar, Sander et al]

(vers. 2)

Normal website + Downloaded code (JAVA)+ Web service (SOAP) with Cytoscape plugin

6 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

SVGA visualization, Network Mgt. (Multiple Network Support, tagging with DB)

7 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

Surveying structural flexibility on a proteomic scale

• Originally identified in early structures Hb, ATCase, hexokinase

• Why study it? Complicated biological phenomena that can be studied in

quantitative detail

• changes in 1000s of coordinates Motions link structure & function

(many functions carried out by motions)

• catalysis, regulation, transport, formation of assemblies, cellular locomotion

• ligand binding Structural genomics will produce many

structures with slight variations on same fold• Next step after fold classification in flexibility

classification

8 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

New structures

increasingly don't give new folds

from"Expectations for Structural Genomics"

[Levitt, Protein Science 9: 197]

9 G

ers

tein

.in

fo/t

alk

s

(c)

20

03

Do not reproduce without permission

Surveying structural flexibility on a proteomic scale

• Questions How do we describe a wide-range of

structural variability in standard terms? Can we develop simple models to

explain constraints on protein flexibility? What information about flexible hinge

location is encoded in sequence?

10

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Computational Proteomics:Understanding Protein Flexibility in a

Database Framework

1) Motions DB based on Simple Models for Protein Flexibility

- Rigid Core Models, Pathway Interpolation, NMA.

2) Detailed Classification of Small Subset of Motions based on Interface Packing

- Packing constrains motions. Two mechanisms, Hinge and Shear, depending on whether there is a well-packed interface, account for many motions (CS vs LF). More involved motions exist (e.g. Ig, GroEL).

3) Comprehensive Statistics on Flexibility over all Structures

- Putting individual motions into perspective from distributions. Some initial conclusions from datamining.

11

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Computational Proteomics:Understanding Protein Flexibility in a

Database Framework

1) Motions DB based on Simple Models for Protein Flexibility

- Rigid Core Models, Pathway Interpolation, NMA.

2) Detailed Classification of Small Subset of Motions based on Interface Packing

- Packing constrains motions. Two mechanisms, Hinge and Shear, depending on whether there is a well-packed interface, account for many motions (CS vs LF). More involved motions exist (e.g. Ig, GroEL).

3) Comprehensive Statistics on Flexibility over all Structures

- Putting individual motions into perspective from distributions. Some initial conclusions from datamining.

12

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

MolMovDB.org

13

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Example "Morph": MBP

2 Known Crystal Structures (endpoints, not necessarily same seq.)

Std. Geometric Stats. (from structure comparison)

Pathway Interpolation

14

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Motions collecting together and annotating Individual morphs into logical units

~19K morphs(instances of conformational variability)

(384 canonical ones)~200 classified

motions

15

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Morph Analysis System to Standardize Protein Motions and Create Pathways

Alignment

Superposition

Screw-Axis Orientation

Homogenization

Pathway Generation

Visual Rendering

Web Report

16

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Simple 2 Rigid Core Model of Protein Motions -- to what degree does it apply?

Struc-1 Struc-2

Core-1 Fit Core-2 Fit

Overall Fit

Std. Statistics RMS

Core 1 & 2 fits Rot. & Trans. T Max. Disp. Centroid-Screw-

Axis Dist.Web Report

Alignment

Superposition

Screw-Axis Orientation

Homogenization

Pathway Generation

Visual Rendering

Do not reproduce without permission

Visualizing Pathways:

Interpolation via Adiabatic Mapping

1' Interpolation Step1 Energy minimization

(VDW + bonds) [Charmm, Encad]

2 Re-interpolate, re-minimize….

* Slows down over humps

0

2000

4000

6000

8000

10000

12000

1 2 3 4 5 6 7 8 9

Interpolation Step

En

erg

y

Web Report

Alignment

Superposition

Screw-Axis Orientation

Homogenization

Pathway Generation

Visual Rendering

1'1

2

18

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Adiabatic Mapping vs Linear Interpolation Strategies

Compared with Calmodulin

cm: 1ctr-1cl1

19

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Adiabatic Mapping vs Linear Interpolation Strategies

Compared with Calmodulin

Frame 4 (adiabatic)cm: 1ctr-1cl1

20

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Adiabatic Mapping vs Linear Interpolation Strategies

Compared with Calmodulin

Frame 4 (adiabatic)Frame 4 (linear)

Collapsed

21

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Other Dynamic Calculations to Model

Pathway

• Most simple possible calculations here, but....• Progressively add and subtract energy terms from

pathway calculation• Interoperate DB within framework of dynamics groups

• Normal Modes …

22

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Normal Mode Analysis

• Describe flexibility of system by characteristic harmonic modes

• Calculate 20 lowest-freq. modes for 1 conformation of each pair in morph, using simple mass distribution and spring potential

• Find best linear combination of modes (v) fitting initial direction of observed motion

• Measure degree to which fit matches initial direction of the observed motion. Measure how concentrated linear combination is in a few modes (entropy ~ v ln v)

23

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Computational Proteomics:Understanding Protein Flexibility in a

Database Framework

1) Motions DB based on Simple Models for Protein Flexibility

- Rigid Core Models, Pathway Interpolation, NMA.

2) Detailed Classification of Small Subset of Motions based on Interface Packing

- Packing constrains motions. Two mechanisms, Hinge and Shear, depending on whether there is a well-packed interface, account for many motions (CS vs LF). More involved motions exist (e.g. Ig, GroEL).

3) Comprehensive Statistics on Flexibility over all Structures

- Putting individual motions into perspective from distributions. Some initial conclusions from datamining.

24

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Breakdown of the Database

Submitted

Manual

Automatic

>4400 user submitted morphs

~200manually classified motions

>14000 automatically classified motions

25

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Classification of motions by

packing

Submitted

Manual

Automatic

26

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Interdigitating structure of protein interfaces constrains motion

27

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Sliding Shear Motion Between two Close Packed Helices

28

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

2 Ideal Mechanisms

Shear Hinge

Mainchain Packing Constrained by close packing Free to kink

Mainchain Torsions Many small changes A few large changes

Motion Overall Concatenation of small local motions Identical to twisting at hinge

Motion at Interface Parallel to plane of interface (shear) Perpendicular to plane of interface, exposing & burying surfaces.

Sidechain Packing Same packing in both forms New contacts created; Packing at base of hinge crucial.

Sidechain Torsions Mostly small changes Some large changes

depending on whether a well-

packed interface is maintained continuously over motion

29

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Glutamate mutase: Intradomain Shear Motion

[Krautler]

30

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Small Shearing Domain Motions: Molybdenum-binding protein & GAPDH

[Lawson] [Wonacott]

31

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Citrate Synthase: Domain Motion with Shearing Helices

32

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Ras: Hinged Loop

[SH Kim]

33

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Troponin: fragment hinge motion of secondary structures

Absence of packing at joint

[M James]

34

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Transferrin: Interdomain Hinges

[Baker]

35

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Transferrin hinge involves absence of steric constraints (continuously

maintained interfaces), esp. at hinge

36

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Packing Tools - Voronoi software to calculate packing volumes

37

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Volume Distribution

and Std. Volume

Typing for Atoms

Optimized Radii for Proteins and Nucleic Acids

38

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Goal of helix.gersteinlab.org:-to provide a comprehensive suite of tools for analyzing helix packing

39

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

helix.gersteinlab.org

enter PDB ID upload PDB file

STRIDE processing

helix-helix interaction report(distance-based)

visualization ofhelix interactions (Jmol)

Voronoi calculation

visualization of helix-helix interface (VRML)

helix-helix contact area calculation (Jmol)

sequence motif search(Jmol)

report of atom-atom contacts from Voronoi calculation

PDB file verificationand tool selection menu

40

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

PDB ID 1C3W, helices 3 and 7(a) (b)

Arg

Asp

(c)

intersection area = 23.3 Å2

crossing angle = 24.6º

PDB ID 1C3W, GxxxG motif (residues 116-120A, GIMIG)

(d)

44

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Computational Proteomics:Understanding Protein Flexibility in a

Database Framework

1) Motions DB based on Simple Models for Protein Flexibility

- Rigid Core Models, Pathway Interpolation, NMA.

2) Detailed Classification of Small Subset of Motions based on Interface Packing

- Packing constrains motions. Two mechanisms, Hinge and Shear, depending on whether there is a well-packed interface, account for many motions (CS vs LF). More involved motions exist (e.g. Ig, GroEL).

3) Comprehensive Statistics on Flexibility over all Structures

- Putting individual motions into perspective from distributions. Some initial conclusions from datamining.

45

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Global Statistics from Comprehensive Analysis of Flexibility in the PDB

Submitted

Manual

Automatic

• "Unbiased" view of flexibility in PDB

•Automatic structural alignments of all pairs in the PDB (based on fold classification)

•One subset of ~14K is 3814 pairs with large structural differences (& acceptable morphs) but great seq. similarity

46

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Rotation Distribution

0

10

20

30

40

50

60

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Rotation Angle of 2nd Core (degrees)

Fre

qu

ency

47

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Translation Distribution

0

10

20

30

40

50

60

70

80

90

0 2 4 6 8 10 12 14 16 18 20 22 24

Translation of 2nd Core (Angstroms)

Fre

qu

ency

48

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Max Displacement Distribution

0

10

20

30

40

50

60

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

Max Atomic Displacement (Angstroms)

Fre

qu

ency

49

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

An individual in the population: typical or unusual

Average Displacement of Moving Core

50

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

TGL: Motion of Small Fragment

[Derewenda]

max-Disp: 13 Å (82%)

Trans: 1.7 Å (92%)

Rot: 2.7° (91%)

max-E: 30 (83%)

51

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

cAMP-dependent Protein Kinase: Complex Motion

[Taylor]

max-Disp: 7.8 Å (92%)

Trans: 0.98 Å (95%)

Rot: 4.9 ° (84%)

max-E: 23 (89%)

52

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Diptheria Toxin: Domain

Swapping

B

A B

A

[Eisenberg]

max-Disp: 60 Å (9.6%)Trans: 66 Å (20%)Rot: 62 ° (37%)max-E 482 (11%)

54

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Real Motion between Diverged Members of Periplasmic Binding Protein II Superfamily

(oligo-peptide & dipeptide binding proteins) [~26% identity]

max-Disp: 30 Å (53%), Trans: 8.7 Å (66%), Rot: 34 (48%), max-E 59 (69%)

[Quiocho]

55

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Degree to which initial direction of motion can be fit by a few modes

56

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Flexibility Prediction from a Single Structure

• Hinge Atlas: a resource for statistical studies of protein flexibility

• Hinge information in sequence, using the hinge atlas• Structure based hinge predictors, tested using Hinge

Atlas Gold

57

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Relation between Vectors of Lowest Normal Mode and Obs. Motion for 4 Most Mobile Atoms: intra-domain motions in Calmodulin & bR

N-terminus

THR 5

C-terminus

VAL 101

VAL 177PHE

153

ASP 64

LYS 148

SER 147 THR 146

C-terminus

N-terminus

58

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

1 2

34

N-terminus

C-terminus

Relation between Vectors of Lowest Normal Mode and Obs. Motion for 4 Most Mobile Atoms: inter-domain motion in T7 RNA polymerase

59

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Data Mining & Clustering on Corpus of

Statistics

• Datamining on statistics… The Hinge Atlas: Hinge information in sequence

• Auto characterize submitted motion as being similar to previous observed motion

• Develop canonical set of motions

60

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

The Hinge Atlas

• Hundreds of protein pairs (morphs) observed• Hinge regions manually selected• Useful for testing hinge predictors or for statistical studies of

hinge properties• Hinge information can be transferred to homologs for which

hinges are unknown• Public involvement• 214 nonredundant proteins annotated, and growing

61

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Viewer and public interface

• Highlight hinges from Hinge Atlas annotation

• ‘Public hinge’ submissions taken from users

• We used this annotation to look for hinge information in sequence..

62

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Glycine and serine are significantly more likely to occur in hinges.

Phenylalanine, valine, alanine, and leucine are less likely to occur,

Log-odds and p-value of amino acid occurrence in hinges

0

0.25

0.5

PHECYS

TRPVAL

ALA ILE

LEU

LYS

GLUM

ETARG

ASPGLN

ASNTHR

TYRHIS

PROGLY

SER

p-v

alu

e

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

LO

D

p-value

LOD score

Amino acid frequency of occurrence in hinges

63

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Are hinges segregated by secondary structure?

Hinges were found to occur preferentially in turns and disordered regions, and to avoid alpha helices.

Hinge coincidence with secondary structure

0

0.15

0.3

Other Alpha helix Beta sheet Turn Random coil

p-v

alu

e

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

LO

D

P-value

LOD

64

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Are certain physicochemical properties preferred in hinge residues?

High confidence that small residues are preferred, aliphatic and hydrophophobic residues are avoided.

Hinge coincidence with physicochemical property

0

0.2

0.4

Alipha

tic

Arom

atic

Hydro

phob

ic

Negativ

e

Charge

d

Positiv

ePola

r

Small

Tiny

p-v

alu

e

-0.12

-0.08

-0.04

0

0.04

0.08

0.12

LO

D

P-value

LOD

65

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Do hinges coincide with active sites?

We found that computer annotated hinges had a significant tendency to coincide with active site residues from the Catalytic Site Atlas. No significant coincidence was found for residues near the hinge.

Hinge and active site residues

0

0.3

0.6

0 1 2 3 4 5 6 7 8 9 10

distance from active site (residues)

p-v

alu

e

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

LO

D

p-value

LOD

66

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Are hinge residues conserved in evolution?

Hinge residues were anti-conserved, or rather hypermutable.

Hinge occurrence vs. conservation

0

0.25

0.5

Top 1/5th 2nd 1/5th 3rd 1/5th 4th 1/5th Bottom 1/5th

bins

p-v

alu

e

-0.1

0

0.1

LO

D

p-value

LOD

67

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Hypermutability is probably due to appearance on surface

Hinge occurrence vs. solvent accessible surface area

0

0.5

1

1 2 3 4 5

ASA bin

p-v

alu

e

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

HI

p-value

HI

3∙10-12 3∙10-9 1∙10-6

<10-30

Hinge residues tended to occur on the surface with extremely high significance.

68

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

HingeSeq: A sequence-based hinge predictor

• Uses the definition of the Hinge Index:

• And sums Hinge Indices for residue type, secondary structure, and active site annotation, assuming indepedence:

)()()()()()(

)()()(log)( 10 iHIiHIiHI

apapap

haphaphapiHingeSeq activessaa

lkj

lkj

)(

)(log)( 10

i

ii ap

hapaHI

ROC curve for HingeSeq

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

FP/(FP+TN)

TP

/(T

P+F

N)

70

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Computational Proteomics:Understanding Protein Flexibility in a

Database Framework

1) Motions DB based on Simple Models for Protein Flexibility

- Rigid Core Models, Pathway Interpolation, NMA.

2) Detailed Classification of Small Subset of Motions based on Interface Packing

- Packing constrains motions. Two mechanisms, Hinge and Shear, depending on whether there is a well-packed interface, account for many motions (CS vs LF). More involved motions exist (e.g. Ig, GroEL).

3) Comprehensive Statistics on Flexibility over all Structures

- Putting individual motions into perspective from distributions. Some initial conclusions from datamining.

71

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Acknowledgements

• Motions• S Flores, N Echols,

V Alexandrov, W Krebs, D Milburn, U Lehnert, A Counterman, K Keating, J Lu

• N Voss, J Tsai

• NIH, NSF

• Networks• P Kim• J Lu• Y Xia• K Yip• H Yu• V Trifonov• A Paccnaro