computational biology, part 10 protein structure prediction and display robert f. murphy copyright ...
Post on 21-Dec-2015
215 Views
Preview:
TRANSCRIPT
Computational Biology, Part 10Protein Structure Prediction and
Display
Computational Biology, Part 10Protein Structure Prediction and
Display
Robert F. MurphyRobert F. Murphy
Copyright Copyright 1996, 1999, 2001. 1996, 1999, 2001.
All rights reserved.All rights reserved.
GoalGoal
Take primary structure (sequence) and, Take primary structure (sequence) and, using rules derived from known structures, using rules derived from known structures, predict the secondary structure that is most predict the secondary structure that is most likely to be adopted by each residuelikely to be adopted by each residue
Structural PropensitiesStructural Propensities
Due to the size, shape and charge of its side Due to the size, shape and charge of its side chain, each amino acid may “fit” better in chain, each amino acid may “fit” better in one type of secondary structure than anotherone type of secondary structure than another
Classic example: The rigidity and side chain Classic example: The rigidity and side chain angle of proline cannot be accomodated in angle of proline cannot be accomodated in an an -helical structure-helical structure
Structural PropensitiesStructural Propensities
Two ways to view the significance of this Two ways to view the significance of this preference (or preference (or propensitypropensity)) It may control or affect the folding of the It may control or affect the folding of the
protein in its immediate vicinity (amino acid protein in its immediate vicinity (amino acid determines structure)determines structure)
It may constitute selective pressure to use It may constitute selective pressure to use particular amino acids in regions that must have particular amino acids in regions that must have a particular structure (structure determines a particular structure (structure determines amino acid)amino acid)
Secondary structure predictionSecondary structure prediction
In either case, amino acid propensities In either case, amino acid propensities should be useful for predicting secondary should be useful for predicting secondary structurestructure
Two classical methods that use previously Two classical methods that use previously determined propensities:determined propensities: Chou-FasmanChou-Fasman Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson
Chou-Fasman methodChou-Fasman method
Uses table of conformational parameters Uses table of conformational parameters (propensities) determined primarily from (propensities) determined primarily from measurements of secondary structure by CD measurements of secondary structure by CD spectroscopyspectroscopy
Table consists of one “likelihood” for each Table consists of one “likelihood” for each structure for each amino acidstructure for each amino acid
Chou-Fasman propensities (partial table)
Chou-Fasman propensities (partial table)
Amino Acid P Pβ Pt
Glu 1.51 0.37 0.74Met 1.45 1.05 0.60Ala 1.42 0.83 0.66Val 1.06 1.70 0.50Ile 1.08 1.60 0.50Tyr 0.69 1.47 1.14Pro 0.57 0.55 1.52Gly 0.57 0.75 1.56
Chou-Fasman methodChou-Fasman method
A prediction is made for each type of A prediction is made for each type of structure for each amino acidstructure for each amino acid Can result in ambiguity if a region has high Can result in ambiguity if a region has high
propensities for both helix and sheet (higher propensities for both helix and sheet (higher value usually chosen, with exceptions)value usually chosen, with exceptions)
Chou-Fasman methodChou-Fasman method
Calculation rules are somewhat Calculation rules are somewhat ad hocad hoc Example: Method for helixExample: Method for helix
Search for nucleating region where 4 out of 6 a.a. Search for nucleating region where 4 out of 6 a.a. have Phave P > 1.03 > 1.03
Extend until 4 consecutive a.a. have an average PExtend until 4 consecutive a.a. have an average P < <
1.001.00 If region is at least 6 a.a. long, has an average PIf region is at least 6 a.a. long, has an average P > >
1.03, and average P1.03, and average P > average P > average Pββ consider region to consider region to
be helixbe helix
Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson
Uses table of propensities calculated Uses table of propensities calculated primarily from structures determined by X-primarily from structures determined by X-ray crystallographyray crystallography
Table consists of one “likelihood” for each Table consists of one “likelihood” for each structure for each amino acid for each structure for each amino acid for each position in a 17 amino acid windowposition in a 17 amino acid window
Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson
Analogous to searching for “features” with a 17 Analogous to searching for “features” with a 17 amino acid wide frequency matrixamino acid wide frequency matrix
One matrix for each “feature”One matrix for each “feature” -helix-helix ββ-sheet-sheet turnturn coilcoil
Highest scoring “feature” is found at each Highest scoring “feature” is found at each locationlocation
Accuracy of predictionsAccuracy of predictions
Both methods are only about 55-65% accurateBoth methods are only about 55-65% accurate A major reason is that while they consider the A major reason is that while they consider the
local context of each sequence element, they local context of each sequence element, they do not consider the global context of the do not consider the global context of the sequence - the type of proteinsequence - the type of protein The same amino acids may adopt a different The same amino acids may adopt a different
configuration in a cytoplasmic protein than in a configuration in a cytoplasmic protein than in a membrane proteinmembrane protein
“Adaptive” methods“Adaptive” methods
Neural network methods - train network using Neural network methods - train network using sets of known proteins then use to predict for sets of known proteins then use to predict for query sequencequery sequence nnpredictnnpredict
Homology-based methods - predict structure Homology-based methods - predict structure using rules derived only from proteins using rules derived only from proteins homologous to query sequencehomologous to query sequence SOPMSOPM PHDPHD
Neural Network methodsNeural Network methods
A neural network with multiple layers is A neural network with multiple layers is presented with known sequences and presented with known sequences and structures - network is trained until it can structures - network is trained until it can predict those structures given those predict those structures given those sequencessequences
Allows network to adapt as needed (it can Allows network to adapt as needed (it can consider neighboring residues like GOR)consider neighboring residues like GOR)
Neural Network methodsNeural Network methods
Different networks can be created for Different networks can be created for different types of proteinsdifferent types of proteins
Homology-based modelingHomology-based modeling
Principle: From the sequences of proteins Principle: From the sequences of proteins whose structures are known, choose a whose structures are known, choose a subset that is similar to the query sequencesubset that is similar to the query sequence
Develop rules (e.g., train a network) for just Develop rules (e.g., train a network) for just this subsetthis subset
Use these rules to make prediction for the Use these rules to make prediction for the query sequencequery sequence
Retrieving 3D structuresRetrieving 3D structures
Protein Data Bank (PDB)Protein Data Bank (PDB) using web browserusing web browser
home page = http://www.pdb.bnl.gov/home page = http://www.pdb.bnl.gov/
using anonymous FTPusing anonymous FTP EntrezEntrez
using web browserusing web browser BLASTBLAST
using web browserusing web browser
Displaying Structures with RasMolDisplaying Structures with RasMol The The GIFGIF image of Ribonuclease A is static image of Ribonuclease A is static
- we cannot rotate the molecule or recolor - we cannot rotate the molecule or recolor portions of it to aid visualizationportions of it to aid visualization
For this we can use For this we can use RasMolRasMol, a public , a public domain program available for wide range of domain program available for wide range of computers, including MacOS, Windows and computers, including MacOS, Windows and UnixUnix
Displaying Structures with RasMolDisplaying Structures with RasMol Drs. David Hackney and Will McClure Drs. David Hackney and Will McClure
have developed an online tutorial for have developed an online tutorial for RasMolRasMol - a link may be found on the 03- - a link may be found on the 03-310, 03-311 and 03-510 web pages310, 03-311 and 03-510 web pages
PDB filesPDB files
In order to optimally display, rotate and In order to optimally display, rotate and color the 3D structure, we need to download color the 3D structure, we need to download a copy of the coordinates for each atom in a copy of the coordinates for each atom in the molecule to our local computerthe molecule to our local computer
The most common format for storage and The most common format for storage and exchange of atomic coordinates for exchange of atomic coordinates for biological molecules is biological molecules is PDB file formatPDB file format
PDB filesPDB files
PDB file format PDB file format is a text (ASCII) format, is a text (ASCII) format, with an extensive header that can be read with an extensive header that can be read and interpreted either by programs or by and interpreted either by programs or by peoplepeople
We can request either the header only or the We can request either the header only or the entire file; the next screen requests the entire file; the next screen requests the header onlyheader only
http://www.pdb.bnl.gov/pdb-bin/opdbshort
http://www.pdb.bnl.gov/pdb-bin/send-pdb?filename=1rat&short=1
http://www.pdb.bnl.gov/pdb-bin/opdbshort
RasMol has a graphics window and a command window
PDB Retrieval & DisplayPDB Retrieval & Display
Can download PDB files from EntrezCan download PDB files from Entrez Second example: Display structures of Second example: Display structures of
MHC proteins containing MHC proteins containing ββ22-microglobulin-microglobulin
Useful RasMol commandsUseful RasMol commands
show sequence show sequence lists all amino acids in each lists all amino acids in each chainchain
select *a select *a selects all residues in chain Aselects all residues in chain A colour red colour red displays the selected residues in displays the selected residues in
redred
Alternatives to RasMolAlternatives to RasMol
NCBI (providers of Entrez service) have NCBI (providers of Entrez service) have developed a public domain 3D viewer for developed a public domain 3D viewer for molecules, Cn3D (“See in 3D”)molecules, Cn3D (“See in 3D”)
Integrated into Network Entrez ClientIntegrated into Network Entrez Client Available as a stand-alone helper Available as a stand-alone helper
applicationapplication
Alternatives to RasMolAlternatives to RasMol
It is often useful for an investigator or It is often useful for an investigator or teacher to be able to save a series of views teacher to be able to save a series of views of one or more molecules so that they can of one or more molecules so that they can be replayed again (creating a be replayed again (creating a scriptscript for a for a “movie” with preprogrammed changes in “movie” with preprogrammed changes in rotation, color, etc.)rotation, color, etc.)
Two programs that do this are Two programs that do this are CHIMECHIME and and MAGEMAGE
Alternatives to RasMolAlternatives to RasMol
CHIMECHIME (derived from RasMol source) is (derived from RasMol source) is available as a Browser Pluginavailable as a Browser Plugin
MAGEMAGE is available as a stand-alone helper is available as a stand-alone helper applicationapplication
Information on both is available through Information on both is available through links on a HELP page at the PDBlinks on a HELP page at the PDB
http://www.pdb.bnl.gov/pdb-bin/opdbshort
Structural homologyStructural homology
It is useful for new proteins whose 3D It is useful for new proteins whose 3D structure is structure is notnot known to be able to find known to be able to find proteins whose 3D structure proteins whose 3D structure isis known known that that are expected to have a similar structure to are expected to have a similar structure to the unknownthe unknown
It is also useful for proteins whose 3D It is also useful for proteins whose 3D structure is known to be able to find structure is known to be able to find other other proteins with similar structuresproteins with similar structures
Finding proteins with known structures based on sequence homology
Finding proteins with known structures based on sequence homology If you want to find known 3D structures of If you want to find known 3D structures of
proteins that are similar in proteins that are similar in primary amino acid primary amino acid sequence sequence to a particular sequence, can use to a particular sequence, can use BLASTBLAST web page and choose the web page and choose the PDB PDB databasedatabase
This is This is notnot the PDB database of structures, rather the PDB database of structures, rather a database of amino acid sequences for those a database of amino acid sequences for those proteins in the structure databaseproteins in the structure database
Links are available to retrieve PDB filesLinks are available to retrieve PDB files
Finding proteins with similar structures to a known proteinFinding proteins with similar structures to a known protein For For literatureliterature and and sequencesequence databases, databases, EntrezEntrez
allows allows neighborsneighbors to be found for a selected to be found for a selected entry based on “homology” in terms (MEDline entry based on “homology” in terms (MEDline database) or sequence (protein and nucleic database) or sequence (protein and nucleic acid sequence databases)acid sequence databases)
An experimental feature allows neighbors to An experimental feature allows neighbors to be chosen for entries in the be chosen for entries in the structure structure database database
Finding proteins with similar structures to a known proteinFinding proteins with similar structures to a known protein Proteins with similar structures are termed Proteins with similar structures are termed
““VAST NeighborsVAST Neighbors” by ” by EntrezEntrez (VAST (VAST refers to the method used to evaluate refers to the method used to evaluate similarity of structure)similarity of structure)
VASTVAST or or structure neighbors structure neighbors may or may may or may not have sequence homology to each othernot have sequence homology to each other
top related