Introduction to 3D-Structure Visualization and Homology Modeling using
the Swiss-Model Workspace
Lorenza Bordoli Biozentrum of the University of Basel and
Swiss Institute of BioinformaticsLecture 3
May 2009
Lecture 3: Outline
• Homology modeling using the Swiss-Model Workspace:– Target sequence feature annotation– Template identification– Target-template alignment– Homology Modeling by template based
fragment assembly – Model quality estimation
Homology (Comparative) Modeling
Evolution of Protein StructuresProtein structure is better conserved than sequence
Homology modeling = Comparative protein modeling
Idea: Using experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target).
Similar Sequence Similar Structure
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure ModelingGeneral Workflow
Protein structure homology modeling usingSWISS-MODEL workspace• Freely accessible: http://swissmodel.expasy.org/workspace
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure Modeling
Protein Domain
annotation (InterPro)
MSA
Model (HMM, PSSM,…) forDNA bdg. Function
MSA
Model forActivation Function 1
MSA
Model forActivation Function 2
Functional annotation of multi-domain proteins
InterPro http://www.ebi.ac.uk/interpro/
InterPro is a database protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences.Member Databases:
– Pfam (HMMs)– PRINTS (PSSMs)– PROSITE (Patterns & Profiles)– ProDom (Motifs from PsiBlast)– SMART (HMMs)– TIGRFAMs (HMMs)– PANTHER (HMMs)– Superfamily (HMMs from SCOP)– UniProt (Sequences)
InterPro1. Family: An InterPro family is a group of evolutionarily related proteins that share similar
domain (or repeat) architecture. 2. Domain: An InterPro domain is an independent structural unit, which can be found alone or in
conjunction with other domains or repeats. Domains are evolutionarily related. 3. Repeat: An InterPro repeat is a region that is not expected to fold into a globular domain on its
own. For example 6-8 copies of the WD40 repeat are needed to form a single globular domain.
4. PTM site: A post-translational modification modifies the primary protein structure. This modification may be necessary for activation or de-activation of function. Examples include glycosylation, phosphorylation, and sulphation, splicing etc.
5. Binding site: An InterPro Binding site binds chemical compounds, which themselves are notsubstrates for a reaction. The compound, which is bound, may be a required co-factor for a chemical reaction, be involved in electron transport or be involved in protein structure modification.
6. Active site: Active sites are best known as the catalytic pockets of enzymes where a substrate is bound and converted to a product, which is then released. Distant parts of a protein's primary structure may be involved in the formation of the catalytic pocket. Therefore, to describe an active site, different signatures will be needed to cover the active site residues.
Functional and structural domain annotation
• Individual structural domains of multidomainproteins often correspond to units of distinct molecular function => you can compare the functional domain prediction with the functional annotations of detected templates.
• The sensitivity of profile-based template detection methods can be enhanced when the search is performed at the domain level rather than searching the whole protein sequence.
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure ModelingSequence feature annotation: 1) prediction of secondary structure elements, 2) disorder and 3) transmembrane (TM) regions
Target sequence feature annotation
• In the twilight zone of sequence alignments, applying secondary structure prediction to the protein of interest may help deciding whether a putative template shares essential structural features.
• The sensitivity of profile-based template detection methods can be enhanced when the search is performed at the domain level rather than searching the whole protein sequence.
Target sequence feature annotation
• Intrinsically unstructured (disordered) regions in proteins have been associated with numerous important biological cellular functions (e.g. cell signaling, transcriptional regulation,…)
• Prediction of disordered and transmembraneregions therefore complement the analysis of protein domain boundaries and functional annotation of the target protein.
Swiss-Model Workspace: Sequence feature annotations
Swiss-Model Workspace: Sequence feature annotations
Human Rhodopsin
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure Modeling• Protein Data Bank PDB http://www.pdb.org => Database of templates
• Separate into single chains• Remove bad structures• Create BLASTable database
or fold library (profiles, HMMs)
Swiss-Model Workspace: Template Library
Swiss-Model Workspace: Template Library
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure Modeling
Template selection:• Sequence Similarity (e.g.
Blast, Psi-Blast, HMMs-HMMs alignment)
• Structure quality (resolution, experimental method)
• Experimental conditions (ligands and cofactors)
PDB: database of protein structures
Template detection:Similarity Search
unknownstructure ?
Alignment to similar proteinwith known structure
comparative
modeling
Template identification: Blast, PSI-Blast
MSA
Model (HMM, PSSM,…) forStructural domains
(e.g. TF DNA binding domain)=> Library of folds
Template identification using fold libraries
Structural superposition
Example : Leucin Zipper DNA binding Domain of Transcription Factors.
Human c-fos/c-jun
Yeast bZIP
…
Structural Alignments
• Protein structure is better conserved than sequence
• Structural alignments establish equivalences between amino acid
residues based on the 3D structures of two or more proteins.
• Structure alignments therefore provide information not available
from sequence alignment methods alone.
Library of HMMs/Profiles for protein structures
Template detection:Similarity Search
unknownstructure ?
Alignment to HMMs/Profilesof known structures
comparative
modelling
Template identification: HMMs, Profile libraries
Swiss-Model Workspace: Template identification
Swiss-Model Workspace: Template identification
Swiss-Model Workspace: Template identification
Swiss-Model Workspace: Template identification
Swiss-Model Workspace: DeepView Project file
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure Modeling
• Pair wise sequence alignments (Blast, PSI-BLAST)
• MSA (T-coffee) or• HMMs/Profile based
tools (HHpred)
Swiss-Model Workspace: target-template alignment• The target–template sequence alignments generated by
the different template database search techniques can be used as the basis for the subsequent model creation. The alignments can be downloaded as DeepView project file, which contains the target sequence aligned to the template structure.
• The program DeepView allows you to display and analyze the alignment in the structural context of the template to manually adjust misaligned regions.
• Once you have finished editing the alignment, save the project file on the local disk and submit it to the ‘Project Mode’ of the Modeling session for model building
[ http://www.expasy.org/spdbv/ ]
DeepView: “Manual Modeling”
Swiss-Model Workspace: Modeling
Swiss-Model Workspace: target-template alignment• You might also want to apply alternative
sequence alignment methods (see Lecture 2) by using multiple sequence alignment programs to align the target and the template sequences and family related proteins.
• The obtained sequence alignment between target and template (and additional homologous proteins) can be submitted to the ‘Alignment mode’ of the modeling session for model building.
Swiss-Model Workspace: Modeling
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure Modeling
1. Template based fragment assembly (Composer, SWISS-MODEL)
2. Satisfaction of spatial restraints (Modeller)
Swiss-Model: Template based fragment assembly
• Find structurally conserved core regions
• Build model core– … by averaging core template backbone atoms (weighted by local
sequence similarity with the target sequence). Leave non-conserved regions (loops) for later …
Swiss-Model: Template based fragment assembly
• Loop (insertion) modeling– Use the “spare part” algorithm to find compatible fragments in a Loop-Database,
or “ab-initio” rebuilding (e.g. Monte Carlo, MD, GA, etc.) to build missing loops.
Swiss-Model: Template based fragment assembly
• Side Chain placement– Find the most probable side chain conformation, using
• homologues structure information• back-bone dependent rotamer libraries• energetic and packing criteria
Swiss-Model: Template based fragment assembly
• Rotamer Libraries
– Only a small fraction of all possible side chain conformations is observed in experimental structures
– Rotamer libraries provide an ensemble of likely conformations
– The propensity of rotamers depends on the backbone geometry:
Swiss-Model: Template based fragment assembly
Swiss-Model Workspace: Modeling results
Swiss-Model Workspace: Modeling results
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure ModelingErrors in template selection or alignment result in bad models:
iterative cycles of alignment, modeling and evaluationBuilt many models, choose best.
HomologyModel(s)
Known Structures(Templates)
Target Sequence Template Selection
Alignment Template - Target
Structure modeling
Structure Evaluation &Assessment
Comparative Protein Structure ModelingQuality estimation:• Stereochemistry check• Global model quality
estimation (statistical potential, e.g. DFIRE)
• Local model quality estimation (statistical potential, e.g. ANOLEA)
Useful to identify regions with errors in geometry
• Ramachandran Plot of backbone angles (ϕ,ψ)– favored regions– generously allowed regions – disallowed regions
– Amino acids with special properties:• PRO: ϕ = 60º• GLY (�)
• Similar plots for χ-angle distributions
Model quality estimation
ANOLEA : (Atomic Non-Local Environment Assessment)
• http://protein.bio.puc.cl/cardex/servers/anolea/
• http://swissmodel.expasy.org/anolea/
Correct Structure:PDB: 1GES
Model with wrongalignment:
Detects local packing errors
Errors in alignments
ANOLEA
Swiss-Model Workspace: Quality estimation
All checking tools are happy, so can I believe it now?
Models are not experimental facts !
Models can be partially inaccurate or sometimes completely wrong !
A model is a tool that helps to interpret biochemical data.
References and further reading:
References and further reading: