lecture 15 homology modeling and protein threadingxiaoman/spring/lecture 15 homology... · • find...

50
Comparative modeling: Homology Modeling and Protein Threading Some slides modified from Kristen Huber, Umass & Charles Yan, Utah State University

Upload: others

Post on 09-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Comparative modeling: Homology Modeling and Protein Threading

Some slides modified from Kristen Huber, Umass&

Charles Yan, Utah State University

Page 2: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Types of Structure Prediction

• De novo protein structure prediction– methods seek to build three‐dimensional protein models "from scratch" 

– Example: Rosetta• Comparative protein structure prediction

– modeling uses previously solved structures as starting points, or templates.

– Example: protein threading

Page 3: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3

What is comparative modeling

In general, comparative modeling consists of– Selection of one or more templates from a database.

• BLAST (for closely related sequences).• PSI‐BLAST (for distantly related sequences).• A single template rarely provides a complete model. Alternative template structures may provide some additional structural features.

– Alignment to the target sequence.• Require a correct alignment of the target and template sequences. This is not trivial, especially when the similarity is not very high.

– Refinement of side chain geometry and regions of low sequence identity.

Page 4: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Comparative modeling

• Homology modeling• threading

Page 5: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Comparative modelingSequence

Sequence HomologyTo known fold

homologyModeling

30‐40%

Threading

Match Found?

Ab initio

No

Model

Yes

<30%

Page 6: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

6

challenges

Challenges– Aligning the target sequence onto the template structure or structures is challenging, and typically results in very significant errors.

– Generally, a significant fraction of residues in a target will have no structural equivalent in an available template. Reliably building regions of the structure not present in a template remains a challenge.

– Side chain accuracy of these approximate models is poor.– Refinement remains the principal bottleneck to progress.

Page 7: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

7

Sequence comparison improves fast

Improving sequence comparison techniques have broadened the scope of comparative modeling.

While 30% sequence similarity was considered to be the threshold for successful comparative modeling, predictions for targets with as low as 17% sequence similarity were made during the CASP4 experimentand 6% during CASP5.

The importance of comparative modeling will continue to grow as the number of experimentally determined structures grows steadily and, therefore, the number of sequences that can be related to a known structure is growing.

Page 8: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

8

Little progress in refining templates

• Comparative modeling methods hardly differ with respect to template selection and alignment.

• Little progress in refining templates. Early hopes that molecular dynamics methods would allow refinement have not been fulfilled. Reasons for thisare a matter of hot debate within the field, with three suggested inter‐related explanations: inadequate sampling of alternative conformations, insufficiently accurate description of the inter‐atomic forces and too short trajectories.

Page 9: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Homology Modeling Defined

– Homolog of a protein is related to it by divergent evolution from a common ancestor

– Based on the reasonable assumption that two homologous proteins will share very similar structures. 

– Given the amino acid sequence of an unknown structure and the solved structure of a homologous protein, each amino acid in the solved structure is mutated computationally, into the corresponding amino acid from the unknown structure.

– The accuracy of predictions by homology modeling strongly depends on the degree of sequence similarity.

Page 10: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Why Homology Modeling?• Value in structure based drug design• Find common catalytic sites/molecular recognition sites

• Use as a guide to planning and interpreting experiments

• 70‐80% chance a protein has a similar fold to the target protein based on known structures from X‐ray crystallography or NMR spectroscopy

• Sometimes it’s the only option or best guess

Page 11: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

11

Similarity of primary sequences matters 

• If the target and the template share more than 50% of their sequences, predictions usually are of high quality and have been shown to be as accurate as low‐resolution X‐ray predictions.

• For 30–50% sequence identity more than 80% of the C‐atoms can be expected to be within 3.5 ˚A of their true positions.

• For less than 30% sequence identity, the prediction is likely to contain significant errors

Page 12: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

factors affecting the quality of homology modeling

The quality of the homology model is dependent on the quality of the sequence alignment and template structure. 

The approach can be complicated by the presence of alignment gaps (commonly called indels) that indicate a structural region present in the target but not in the template, and by structure gaps in the template that arise from poor resolution in the experimental procedure (usually X‐ray crystallography) used to solve the structure.

Page 13: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

The quality of homology modeling

Model quality declines with decreasing sequence identity; a typical model has ~1‐2 Å root mean square deviation between the matched Cα atoms at 70% sequence identity but only 2‐4 Å agreement at 25% sequence identity. However, the errors are significantly higher in the loop regions, where the amino acid sequences of the target and template proteins may be completely different

Page 14: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Homology Modeling Limitations

• Cannot study conformational changes• Cannot find new catalytic/binding sites• Large Bias towards structure of template• Models cannot be docked together

Page 15: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Comparative modelingSequence

Sequence HomologyTo known fold

homologyModeling

30‐40%

Threading

Match Found?

Ab initio

No

Model

Yes

<30%

Page 16: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Protein Threading

Protein threading, also known as fold recognition, is a method of protein modeling, that is, computational protein structure prediction, which is used to model those proteins which have the same fold as those of proteins of known structures but do not have homologous proteins with known structure. 

Page 17: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Different from homology modeling

Protein threading is different from the homology modeling method of protein structure prediction in the sense that it is used for proteins which do not have their homologous protein structures deposited in the pdb. Protein threading predicts protein structures by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which is wished to be modeled.

Page 18: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Protein Threading

• The word threading implies that one drags the sequence (ACDEFG...) step by step through each location on each template

Page 19: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Protein Threading

Page 20: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

20

Why Threading (I)

• While similar sequence implies similar structure, the converse is in general not true. 

• In contrast, similar structures are often found for proteins for which no sequence similarity to any known structure can be detected. 

• As a consequence, the repertoire of different folds is more limited than suggested by sequence diversity.

Page 21: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

21

Why threading (II)• Fold recognition methods are motivated by the notion that 

structure is evolutionary more conserved than sequence.• Fold recognition methods are one class of comparative 

modeling methods that aim at predicting the three‐dimensional folded structure for amino acid sequences for which homology methods provide no reliable prediction.

• Since the number of sequences is much larger than the number of folds, fold recognition methods attempt to identify a model fold for a given target sequence among the known folds even if no sequence similarity can be detected.

Page 22: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

22

Threading • Threading‐based methods are known to be computationally expensive. 

• Globally optimal protein threading is known to be NP‐hard

• Several threading methods ignore pairwiseinteraction between residues. In doing so, the threading problem is simplified considerably, and the simplified problem can be solved with dynamic programming

Page 23: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

23

Threading• In early methods of this kind, a one dimensional string of features was recorded for known folds and compared to the target sequence.

• The recorded features comprise attributes like buried sidechain area, side chain area covered by polar atoms including water, and the local secondary structure. 

• In this manner, the three‐dimensional structure of known proteins is converted into a one‐dimensional sequence of descriptors and fold recognition is reduced to seeking the most favorable sequence alignment between the query sequence and a database of sequences. 

• Recent approaches take into account pairwise residue interaction potentials that describe a mean force derivedfrom a database of known structures.

Page 24: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Threading Methods

• Bowie, Lüthy and Eisenberg  (1991)• 2 approaches to recognition methods• Derive a 1‐D profile for each structure in the fold library and align the target sequence to these profiles – Identify amino acids  based on core or external positions– Part of secondary structure

• Consider the full 3‐D structure of the protein template – Modeled as a set of inter‐atomic distances– NP‐Hard (if include interactions of multiple residues) 

Page 25: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

25

Threading based on secondary structure

• One approach to fold recognition is based on secondary structure prediction and comparison. 

• This subclass of methods is based on the observation that secondary structure similarity can exceed 80% for sequences that exhibit lessthan 10% sequence similarity.

• Clearly any such approach can only be as good as the underlying secondary structure prediction method.

Page 26: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

26

accuracy

Accuracy of secondary structure predictions.– 60%  (1990s)– 76% (Current)

Page 27: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

Some Threading Programs

• 3D‐pssm (ICNET). Based on sequence profiles, solvatation potentials and secondary structure.

• TOPITS (PredictProtein server) (EMBL). Based on coincidence of secondary structure and accesibility. 

• UCLA‐DOE Structure Prediction Server (UCLA). Executes various threading programs and report a consensus. 

• 123D+  Combines substitution matrix, secondary structure prediction, and contact capacity potentials. 

• SAM/HMM (UCSC). Basen on Markov models of alignments of crystalized proteins. • FAS (Burnham Institute). Based on profile‐profile matching algorithms of the query 

sequence with sequences from clustered PDB database. • PSIPRED‐GenThreader (Brunel) • THREADER2 (Warwick). Based on solvatation potentials and contacts obtained 

from crystalized proteins. • ProFIT CAME (Salzburg) 

Page 28: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

A more complete list

• http://en.wikipedia.org/wiki/Protein_structure_prediction_software

Page 29: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

29

Comparative Modeling

• SWISS‐MODELhttp://swissmodel.expasy.org//SWISS‐MODEL.html

Page 30: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

30

An example of protein docking

Page 31: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

1. Problem

Problem:Protein docking analysis between OPA1 (Ligand) and SIRT‐3 (Receptor)

Input: (1) PDB (Protein data bank) format files for OPA1 and SIRT‐3(2) Residue selection: They are interested in the Lys228 in 

OPA1 (which means 228 (Lys) Residue in the ligand OPA1).Output:Protein docking result 

Page 32: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

1. Problem

A little basic of protein docking:Docking is a method which predicts the preferred orientation of one molecule to a second when bound to each other to form a stable complex. Generally, it is searching for all kinds of different binding "Pose" (orientation) of two proteins and then scoring those different "Poses". With the calculated scores, the software make the prediction of docking.

Page 33: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2. MethodThere are several tools can do the protein docking analysis, the following are widely used.Global Docking searching tool: 2.1 HEX (http://hex.loria.fr/hex.php) 2.2 Zdock(http://zdock.umassmed.edu/) 2.3 Patchdock(http://bioinfo3d.cs.tau.ac.il/PatchDock/)Local docking searching tools:Use results from global docking tools as the input, provide more accurate results 2.4 Rosettadock(http://rosettadock.graylab.jhu.edu/docking2/submit)

*Note: We use a combination of Global +Local docking tools to get the best docking predictions.

Page 34: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.1 HEX(1)Load in Receptor and Ligand (in .PDB format)

Page 35: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.1 HEX

(2) Setup Orientation (Controls‐>Orientation)

Specify interface here!

Page 36: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.1 HEX

(3) Setup Docking(Controls‐>Docking)

They will give default parametersFor docking. If you don’t know how toSetup, just use the defaults.

Click to run docking

Page 37: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.1 HEX

(4) Save results(File‐>Save‐>Both)It will output comprehensive PDB format file. 

Page 38: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.2 ZDOCK

(1)Load in receptor and Ligand

Receptor

Ligand

Your Email, they will send The results to the email

Page 39: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.2 ZDOCK(2) Set up Contact and Blocking Residues

(3) The docking results is in PDB format

Page 40: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.3 PATCHDOCK(1)Load in Receptor and Ligand PDBs

Page 41: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.3 PATCHDOCK(2) Setting up the binding Sites (Optional) 

Binding site file is a TEXT file with the following format:[residue index] [chain ID], for example, 199 of chain F‐‐‐‐‐‐‐‐199 F‐‐‐‐‐‐‐‐‐If there is only one chain, then just use 199‐‐‐‐‐‐‐199 ‐‐‐‐‐‐‐

Page 42: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.4 RosettaDock

(1) Load in combined PBD file, RosettaDock must relies on the predictions from other global searching tools (like HEX,ZDOCK and etc.)

Page 43: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

2.4 RosettaDock

Input format description: a. The input PDB is a combined PDB format. when creating a combined single input file with both partners, place a TER line between the two partners. Remove all other TER lines. This can be done by a TEXT editor, EXCEL or a simple script. b. RosettaDock involving 2 binding chains, therefore only keep 2 chains( one from receptor, one from ligand). These 2 chains can be decided by the results from other software (like HEX or ZDOCK). Rename the chain ID to A and B. c. Specify the docking partner. (If you rename the chain ID to A and B in the combined PDB format, use A_B)

Page 44: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3. Docking Results Analysis

(1)Docking Results formatThe output results from these software were all in PDB format. http://deposit.rcsb.org/adit/docs/pdb_atom_format.html

Page 45: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3. Docking Results Analysis

(2) Visualize the resultsThe results can be visualized by using JMOL

Page 46: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3. Docking Results AnalysisIn order to use JMOL, we need to process it first. The output PDB need to be modified into combined PDB format. In other words, they should be 2 models in a single PDB format.

Model 1(Receptor)

Model 12(Ligand)

Page 47: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3. Docking Results Analysis

Page 48: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3. Docking Results Analysis

(3) Use Jmol to analyze PDB result file to get other information

(e.g.  SIRT3‐OPA1_68000,  we specify the binding sites on the OPA1_68000 (Ligand), we get the predictions (PDB), we try to figure out the corresponding sites on the Receptor part (SIRT3).  We can use Jmol to analyze the PBD file to get the information) 

Page 49: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3. Docking Results Analysis

Hex+Rosetta result (for LYS228):

Best docking site: Around AA276 of chain F 

ZDock+Rosetta (for LYS228):Best docking site:

Around AA277 of chain F

Page 50: lecture 15 Homology Modeling and Protein Threadingxiaoman/spring/lecture 15 Homology... · • Find common catalytic sites/molecular recognition sites • Use as a guide to planning

3. Docking Results Analysis

Around AA 276 of chain F are predicted by multiple docking prediction tools to be the best docking position for the interested LYS228 residue. 

Therefore, we conclude that the docking is around AA 276 of chain F of SIRT‐3 for the interested LYS228  residue in the OPA1.