protein mutational analysis using statistical geometry methods

29
Protein Mutational Analysis Using Statistical Geometry Methods Majid Masso [email protected] http://mason.gmu.edu/~mmasso Bioinformatics and Computational Biology George Mason University

Upload: solomon-rogers

Post on 31-Dec-2015

27 views

Category:

Documents


0 download

DESCRIPTION

Protein Mutational Analysis Using Statistical Geometry Methods. Majid Masso [email protected] http://mason.gmu.edu/~mmasso Bioinformatics and Computational Biology George Mason University. Protein Basics. formed by linearly linking amino acid residues (aa’s are the building blocks of proteins) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Protein Mutational Analysis Using Statistical Geometry Methods

Protein Mutational Analysis Using Statistical Geometry

Methods

Majid [email protected]

http://mason.gmu.edu/~mmasso

Bioinformatics and Computational Biology

George Mason University

Page 2: Protein Mutational Analysis Using Statistical Geometry Methods

Protein Basics formed by linearly linking

amino acid residues (aa’s are the building blocks of proteins)

20 distinct aa types A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,

V,W,Y

+H3N Cα C

H O

O-

CH2

CH

CH3 H3C

Identical for al l amino acids

Unique side chain (R group) for each amino acid

Leucine (Leu or L)

+H3N Cα C

H O

O-

R1

+ +H3N

H

R2

C O-

O

H2O

+H3N Cα Cα

O

R1

C C

H H

N

R2 H

O-

O

peptide bond

Page 3: Protein Mutational Analysis Using Statistical Geometry Methods

Amino Acid Groups Brandon/Tooze (affinity for water)

hydrophobic aa’s: A,V,L,I,M,P,F hydrophilic aa’s:

polar: N,Q,W,S,T,G,C,H,Y charged: D,E,R,K

Dayhoff (similar wrt structure or function) (A,S,T,G,P),(V,L,I,M),(R,K,H),(D,E,N,Q),(F,Y,W),(C) conservative substitution: replacement with an

amino acid from within the same class non-conservative substitution: interclass

replacement

Page 4: Protein Mutational Analysis Using Statistical Geometry Methods

Protein Basics genes: code, or “blueprint” proteins: product, or

“building” protein structure gives rise to

function why do “things go wrong”?

mistakes in “blueprint” incorrectly built, or nonexistent

“buildings” Protein Data Bank (PDB):

repository of protein structural data, including 3D coords. of all atoms (www.rcsb.org/pdb/)

PDB ID: 1REZ

Structure reference: Muraki M., Harata K., Sugita N., Sato K., Origin of carbohydrate recognition specificity of human lysozyme revealed by affinity labeling, Biochemistry 35 (1996)

Page 5: Protein Mutational Analysis Using Statistical Geometry Methods

Computational Geometry Approach to Protein Structure Prediction

Tessellation protein structure represented as a set

of points in 3D, using Cα coordinates Voronoi tessellation: convex polyhedra,

each contains one Cα , all interior points closer to this Cα than any other

Delaunay tessellation: connect four Cα

whose Voronoi polyhedra meet at a common vertex

vertices of Delaunay simplices objectively define a set of four nearest-neighbor residues (quadruplets)

5 classes of Delaunay simplices Quickhull algorithm (qhull program),

Barber et al., UMN Geometry Center

Voronoi/Delaunay tessellation in 2D space. Voronoi tessellation-dashed line, Delaunay tessellation-solid line (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.)

i

i+1i+2

i+3j

ii+2 i+1

j+1

j

ii+1

k

j

ii+1l

k

j

i

{1-1-1-1} {2-1-1} {2-2} {3-1} {4}

Five classes of Delaunay simplices. (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.)

Page 6: Protein Mutational Analysis Using Statistical Geometry Methods

Counting Quadruplets

assuming order independence among residues comprising Delaunay simplices, the maximum number of all possible combinations of quadruplets forming such simplices is 8855

D F E C

C C D E

C C D D

C C C D

C C C C

20

4

1920

2

20

2

20 19

20

Page 7: Protein Mutational Analysis Using Statistical Geometry Methods

Residue Environment Scores log-likelihood:

= normalized frequency of quadruplets containing residues i,j,k,l in a representative training set of high-resolution protein structures with low primary sequence identity

i.e., = total number of quadruplets in dataset containing only residues i,j,k,l divided by total number of observed quadruplets

= frequency of random occurrence of the quadruplet (multinomial)

i.e., = total number of occurrences of residue i divided by total

number of residues in the dataset , where n = number of distinct residue types in the quadruplet, and t i is the number of residues of

type i.

ijklf

ijklp

ijkl i j k lp ca a a a

logijkl ijkl ijklq f p

ia

ijklf

4!

!n

ii

ct

Page 8: Protein Mutational Analysis Using Statistical Geometry Methods

Residue Environment Scores total statistical potential (topological score) of protein: sum the log-

likelihoods of all quadruplets forming the Delaunay simplices individual residue potentials: sum the log-likelihoods of all quadruplets in

which the residue participates (yields a 3D-1D potential profile)

3phv Potential Profile

Residue Number

0 10 20 30 40 50 60 70 80 90 100

Po

ten

tia

l

-2

0

2

4

6

8

10

12

Structure reference: R. Lapatto, T. Blundell, A. Hemmings, et al., X-ray analysis of HIV-1 proteinase at 2.7 Å resolution confirms structural homology among retroviral enzymes, Nature 342 (1989) 299-302.

PDB ID: 3phvHIV-1 Protease

Monomer99 amino acids

(total potential 27.93)

Page 9: Protein Mutational Analysis Using Statistical Geometry Methods

Properties of HIV-1 Protease

functional as a homodimer 99 residues per subunit

monomers form an intermolecular two-fold axis of symmetry

approximate intramolecular two-fold axis of symmetry

dimer interface: N and C termini (P1-T4 & C95-F99, respectively) form a four-stranded beta sheet

active site triad: D25-T26-G27

h-phobic flaps (M46-V56) are also G-rich, providing flexibility

accommodate / interact with substrate molecule

Figure adapted from URL:http://mcl1.ncifcrf.gov/hivdb/Informative/Facts/

facts.html

Page 10: Protein Mutational Analysis Using Statistical Geometry Methods

HIV-1 Protease Comprehensive Mutational Profile (CMP) mutate 19 times the residue present at each of the 99 positions in the primary sequence get total potential and potential profile of each artificially created mutant protein create 20x99 matrix containing total potentials of all the single residue mutants

columns labeled with residues in the primary sequence of wild-type (WT) HIV-1 protease monomer, and rows labeled with the 20 naturally occurring amino acids

subtract WT total potential (TP) from each cell, then average columns to get CMP CMPj = [(mutant TP)ij-(WT TP)] = [(mutant TP)ij-27.93] , j=1,…,99

3phv Comprehensive Mutational Profile

Residue Number

0 10 20 30 40 50 60 70 80 90 100

Me

an

Ch

an

ge

in

To

tal

P

ro

tein

Po

ten

tia

l

-8

-6

-4

-2

0

2

4

20

1

1

20 i

20

1

1

20 i

Page 11: Protein Mutational Analysis Using Statistical Geometry Methods

3phv Clustered Comprehensive Mutational Profiles

-10

-8

-6

-4

-2

0

2

4

P1

Q I T L 5

. . . E21

A L L D T G A D D30

. . . A71

I G T V L V G P T 80

. . . C 95

T L N F99

C

NC

ALL

-12

-10

-8

-6

-4

-2

0

2

4

6

P 1

Q I T L 5

. . . E 21

A L L D T G A D D30

. . . A71

I G T V L V G P T 80

. . . C95

T L N F99

H-phobic

Charged

Polar

Total

Mea

n C

han

ge in

Tot

al P

rote

in P

oten

tial

Residue

Page 12: Protein Mutational Analysis Using Statistical Geometry Methods

3phv Comprehensive Mutational Profile vs. Potential Profile

Individual Residue Potentials of Wild-Type Protein (potential of residue j in WT HIV-1 protease)

-2 0 2 4 6 8 10 12

Me

an

Ch

an

ge

in T

ota

l Pro

tein

Po

ten

tia

l (C

MP j)

-8

-6

-4

-2

0

2

4

P1Q2

I3

T4

L5

W6

Q7

R8

P9

L10

V11

T12

I13

K14

I15

G16

G17

Q18

L19

K20

E21

A22

L23

L24

D25

T26G27

A28

D29

D30

T31

V32 L33

E34

E35

M36

S37

L38

P39

G40

R41

W42

K43

P44

K45M46

I47

G48G49

I50

G51

G52F53

I54

K55

V56

R57

Q58

Y59

D60

Q61

I62

L63

I64

E65

I66

C67

G68

H69K70

A71

I72

G73

T74

V75

L76

V77

G78

P79

T80P81V82

N83

I84

I85

G86

R87

N88

L89

L90

T91

Q92

I93

G94

C95

T96

L97

N98

F99

Page 13: Protein Mutational Analysis Using Statistical Geometry Methods

3phv Comprehensive Non-Conservative Mutational Profile vs. Potential Profile

Individual Residue Potentials of Wild-Type Protein

-2 0 2 4 6 8 10 12

Mea

n C

hang

e in

Ove

rall

Pro

tein

Pot

entia

l

-10

-8

-6

-4

-2

0

2

4

P1Q2

I3

T4

L5W6

Q7

R8

P9

L10

V11

T12

I13

K14

I15

G16

G17

Q18

L19

K20

E21 A22

L23

L24

D25

T26

G27A28

D29

D30

T31

V32L33

E34

E35

M36

S37

L38

P39

G40

R41

W42

K43

P44

K45M46

I47

G48G49

I50

G51

G52F53

I54

K55

V56

R57

Q58

Y59

D60

Q61

I62

L63

I64

E65

I66

C67

G68

H69

K70A71

I72

G73

T74

V75

L76

V77

G78

P79

T80P81

V82

N83

I84

I85

G86

R87

N88

L89L90

T91

Q92

I93

G94

C95

T96

L97

N98

F99

Page 14: Protein Mutational Analysis Using Statistical Geometry Methods

3phv Comprehensive Conservative Mutational Profile vs. Potential Profile

Individual Residue Potentials of Wild-Type Protein

-2 0 2 4 6 8 10 12

Mea

n C

hang

e in

Ove

rall

Pro

tein

Pot

entia

l

-3

-2

-1

0

1

P1

Q2

I3

T4

L5

W6

Q7

R8

P9

L10

V11T12

I13

K14

I15

G16

G17

Q18L19

K20

E21

A22

L23

L24

D25

T26

G27

A28

D29

D30

T31

V32

L33

E34

E35

M36

S37

L38P39G40

R41

W42

K43

P44

K45

M46

I47G48

G49

I50

G51

G52

F53

I54

K55

V56R57

Q58

Y59

D60

Q61

I62

L63

I64

E65

I66

C67

G68

H69

K70

A71

I72

G73

T74

V75

L76

V77G78

P79

T80

P81

V82N83

I84

I85

G86

R87

N88

L89

L90

T91Q92

I93

G94

C95

T96

L97

N98

F99

Page 15: Protein Mutational Analysis Using Statistical Geometry Methods

Experimental Data

536 single point missense mutations 336 published mutants: Loeb D.D., Swanstrom R.,

Everitt L., Manchester M., Stamper S.E., Hutchison III C.A. Complete mutagenesis of the HIV-1 protease. Nature, 1989, 340, 397-400

200 mutants provided by R. Swanstrom (UNC) each mutant placed in one of 3 phenotypic

categories, positive, negative, or intermediate, based on activity

mutant activity to be compared with change in sequence-structure compatibility elucidated by potential data

Page 16: Protein Mutational Analysis Using Statistical Geometry Methods

Experimental Data

Page 17: Protein Mutational Analysis Using Statistical Geometry Methods

3phv Structure-Function Correlations

-1.80

-1.60

-1.40

-1.20

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

HIV-1 Protease Assay

HIV-1 Protease Mutagenesis Data

Ave

rag

e C

han

ge

in P

ote

nti

al

ALL -0.23 -0.74 -1.39

C -0.14 -0.75 -0.23

NC -0.29 -0.73 -1.65

Positive Intermediate Negative

Observations set of mutants with unaffected protease activity exhibit minimal

(negative) change in potential set of mutants that inactivate protease exhibit large negative change in

potential, weighted heavily by NC set of mutants with intermediate phenotypes exhibit moderate negative

change in potential (similar among C and NC); wide range for intermediate phenotype in the experiments

Page 18: Protein Mutational Analysis Using Statistical Geometry Methods

Evolutionarily Conserved Residue Positions

Page 19: Protein Mutational Analysis Using Statistical Geometry Methods

Apply chi-square test statistic on tables above, with the null hypothesis being no association between residue position conservation and level of sensitivity to mutation :

LHS table (1 df): χ2 = 10.44, reject null with p < 0.01 RHS table (2 df): χ2 = 75.49, reject null with p < 0.001

Page 20: Protein Mutational Analysis Using Statistical Geometry Methods

Mutagenesis at the Dimer Interface

Q2, T4, T96, and N98 are polar and side chains directed outward; P1, I3, L97, and F99 are hydrophobic and side chains directed toward body

F99 in one subunit makes extensive contacts with I3, V11, L24, I66, C67, I93, C95, and H96 in the complementary chain

Impact of the F99A Mutation in One Chain of the HIV-1 Protease on Conctacts in the Complementary Subunit

Residue Number

0 10 20 30 40 50 60 70 80 90 100

Diffe

rence in R

esid

ue

Pote

ntial (F

99

A -

WT

)

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

Page 21: Protein Mutational Analysis Using Statistical Geometry Methods

Mutagenesis at the Dimer Interface

Alanine scan conducted on interface residues individually and in pairs, in one subunit and in both chains; activity of mutants measured by % cleavage of β-galactosidase containing a protease cleavage site

S. Choudhury, L. Everitt, S.C. Pettit, A.H. Kaplan, Mutagenesis of the dimer interface residues of tethered and untethered HIV-1 protease result in differential activity and suggest multiple mechanisms of compensation, Virology 307 (2003) 204-212.

Results: Good correlation between % cleavage (protease activity) and topological scores (protease sequence-structure compatibility)

Page 22: Protein Mutational Analysis Using Statistical Geometry Methods

Structure-Function Correlation Based on Mutations in Both Subunits of HIV-1 Protease

P1A WT

N98A

T96A

Q2AT4A

N98D

I3A

Q2A+I3A

F99A

L97A

T96A+L97A

L97A+N98A

I3A+T4A

R2 = 0.61

-6

-5

-4

-3

-2

-1

0

1

2

3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

% Cleavage

Dif

fere

nc

e in

To

po

log

ica

l Sc

ore

s (

Mu

tan

t -

WT

)

Page 23: Protein Mutational Analysis Using Statistical Geometry Methods

Structure-Function Correlation Based on Mutations in One Subunit of HIV-1 Protease

L97A

L97A+N98AT96A+L97A

F99A

Q2A+I3A

I3A+T4A

I3A N98D

T4AQ2A

T96A

P1AWT

N98A

R2 = 0.57

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

% Cleavage

Dif

fere

nc

e in

To

po

log

ica

l Sc

ore

s (

Mu

tan

t -

WT

)

Page 24: Protein Mutational Analysis Using Statistical Geometry Methods

Conformational Changes Due to Dimerization and/or Ligand Binding

PDB ID: 1g35 HIV-1 Protease Dimer with Inhibitor aha024

monomer in a dimeric configuration with an inhibitor: obtain profile for 1g35, plot 3D-1D only for g35A

isolated monomer: eliminate all PDB coordinate lines in 1g35 except those for 1g35A, obtain profile, plot 3D-1D

plot interface: difference between the 1g35A 3D-1D’s in the dimer and monomer configurations

Structure reference: W. Schaal, A. Karlsson, G. Ahlsen, et al., Synthesis and comparative molecular field analysis (CoMFA) of symmetric and nonsymmetric cyclic sulfamide HIV-1 protease inhibitors, J. Med. Chem. 44 (2001) 155-169

Page 25: Protein Mutational Analysis Using Statistical Geometry Methods

1g35A Interface

Residue Number

0 10 20 30 40 50 60 70 80 90 100

Diffe

ren

ce

in

Po

ten

tia

l P

rofile

s

-2

-1

0

1

2

3

4

5

Observations majority of residues forming both dimer interface and flap region

exhibit increase in stability following dimerization: Q2, T4, I47-I54, T96, L97, and F99

all h-phobic except Q2 increase in stability due to inhibitor binding evident for the active

site residues D25, T26, and G27; also true for the surrounding h-phobic residues L24 and A28

Page 26: Protein Mutational Analysis Using Statistical Geometry Methods

Significance of Hydrophobic Residues in HIV-1 Protease

35/99 amino acids with scores exceeding 1.0 27 of these are hydrophobic altogether, 44/99 amino acids in protease are hydrophobic

Assuming h-phobic residues no more likely than others (polar/charged) to have score>1.0

expect (35/99)x44, i.e. 15 or 16 h-phobics >1.0 P(27 h-phobics>1.0)= < 0.001, yet this is

exactly what we observe! What about other cut-off scores, and other proteins?

applied similar test to all 996 proteins in the training set—while varying cut-off between 0.0-5.0 in 0.25 increments, binomial probabilities were calculated for each protein. For a given p-value, # of proteins with a lower significance level at each cut-off score was tabulated

27 17 -444! 35 6427!17! 99 99 2.7x10

Page 27: Protein Mutational Analysis Using Statistical Geometry Methods
Page 28: Protein Mutational Analysis Using Statistical Geometry Methods

Significance of Hydrophobic Residues optimal cut-off score for rejection of the null is

clearly distinct for each of the individual proteins. Ex. 827 proteins reject a null with 2.0 cut-off score at p

= 0.05, but 918 proteins reject the null at the same significance level if all cut-off scores considered.

alternate approach: 92,343 h-phobic amino acids and 136,329 others (polar/charged), total of 228,672 residues in the 996 proteins; assuming no differ. in the mean of the scores in both groups, apply t-test.

Result: t=126.48, with 228,670 df => reject null!

Page 29: Protein Mutational Analysis Using Statistical Geometry Methods

Acknowledgements Iosif Vaisman (Ph.D. advisor, first to

apply Delaunay to protein structure)

Zhibin Lu (Java programs for calculating statistical potentials from tessellations)

Ronald Swanstrom (experimental HIV-1 protease mutants and activity measure)