introduction to 3d-structure visualization and …...introduction to 3d-structure visualization and...

Post on 03-Jun-2020

34 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to 3D-Structure Visualization and Homology Modeling using

the Swiss-Model Workspace

L. Bordoli Biozentrum of the University of Basel and

Swiss Institute of Bioinformatics

May 2009

Outline

• Recapitulation: properties of protein structures– Amino acids properties– Protein folding– Primary, Secondary, Tertiary and Quaternary

structure• The Protein Structure Database (PDB)• Representation of Structural Information

– file formats– structure visualization using DeepView

• Brief Recap: Some properties of protein

structures

– Primary Structure• Amino acids

• Peptide bonds

– Secondary Structure

– Tertiary Structure

– Quaternary Structure

Recapitulation: Protein Structures

• Proteins are polypeptides (generally: polyamides)Recapitulation: Primary Structures

Backbone+

Side chains

Carboxyl group reacts with amine group

• 20 standard L-amino acids

Recapitulation: Amino Acids

Stereochemistry: L- and D-amino acids

“L” “D”

• 20 standard L-amino acids

Recapitulation: Amino Acids

Neutral HydrophobicAlanineValine

LeucineIsoleucine

ProlineTryptophanePhenylalanine

Methionine

Neutral PolarGlycineSerine

ThreonineTyrosineCysteine

AsparagineGlutamine

AcidicAspartic AcidGlutamic Acid

BasicLysin

Arginine(Histidine)

Amino Acids: Side Chain Properties

The hydropathy index of an amino acid is a number representing the hydrophobic (*) or hydrophilic (**) properties of its side-chain: the larger the number the more hydrophobic.

*

*

**

**

• Chemical properties of standard L-amino acids:

• Aprox. pKa values of side chains:

– Arg 12.5– Lys 10.8– Tyr 10.1– Cys 8.3– His 6.0– Glu 4.1– Asp 3.9

Ka= dissociation constant: degreeof deprotonation

Amino Acids: Side Chain Properties

ΔGfold = ΔH - TΔS

Energetics of protein folding

Then a system changes from a well-defined initial state to a well-defined final state, the Gibbs free energy ΔG equals the work exchanged by the system with its surroundings, less the work of the pressure forces, during a reversible transformation of the system from the same initial state to the same final state.

The enthalpy change ΔH = change in the internal energy of the system

The entropy change ΔS: change in the amount of order, disorder, and/or chaos in a thermodynamic system

Water molecules in bulk water are mobile and can form H-bonds in all directions.

Hydrophobic surfaces don’t form H-bonds. The surrounding water molecules have to orient and become more ordered.

The entropy loss can be minimized by gathering the hydrophobic surfaces together in the core of a protein and separating them from the solvent.

Protein Folding: Hydrophobic Effects

• main driving force for protein folding

Protein Folding: Hydrophobic Effects

• main driving force for protein folding

• A H-bond occurs when two electronegative atoms (e.g. N, O)

compete for the same hydrogen atom:

• H-bonding partners include:

– main chain atoms

– side chain atoms

– water molecules

– ligands, etc…

Protein Folding: Hydrogen Bonds

N

O

CC

N

H

Q: Do H-bonds stabilize a

protein fold ?

• In the unfolded state, all potential hydrogen bonding partners in the extended polypeptide chain are satisfied by hydrogen bonds to water. When the protein folds, these protein-to-water H-bonds are broken, and only some are replaced by (often sub-optimal) intra-protein H-bonds (enthalpic terms increase).

• It would appear that hydrogen bonding is destabilizing to folded protein structure

• However, one must also consider entropy. When a protein folds, and those hydrogen bonds that the protein made to bulk water are broken, the entropy of the solvent increases.

Protein Folding: Hydrogen BondsProtein Folding: Hydrogen Bonds

• The balance between the entropy and enthalpy terms are close, and in the recent past it was considered that H-bonds made no contribution overall to protein stability.

• But, it is now generally accepted that H-bonds make a positive contribution to protein stabilization.

• We must remember that if we break or delete an intramolecular hydrogen bond in a protein without the possibility of forming a compensating H-bond to solvent, that protein will be destabilized.

Protein Folding: Hydrogen BondsProtein Folding: Hydrogen Bonds

• Difference of two very large energetic terms • Low overall stabilization energy

Energetics of protein folding

H-bondshydrophobic effects (entropy)

salt bridges (enthalpy)SS - bonds

loss of solvationentropy change

dispersion / VdW contactsconformational energy

ΔGfold = ΔH - TΔS

Energetics of protein folding

• Primary Structure

• Secondary Structure

• Tertiary Structure

• Quaternary Structure

Principles of protein structure

Three-dimensional form of local segments of proteins, such as the formation of loops or helices.

Secondary Structures: α-Helices • α-Helices: Every backbone N-H group donates a hydrogen bond

to the backbone C=O group of the amino acid four residues earlier (i+4 -> i hydrogen bonding).

Atomic representation

Full atom (cpk)representation

Ribbon (cartoon)representation

• β-sheets - beta strands connected laterally by three or more

hydrogen bonds, forming a generally twisted, pleated sheet.

Secondary Structures: β-sheets

Bovine pancreatictrypsin inhibitor

0° - 30° per aa

• Most β-sheets have a left-handed twist:

Secondary Structures: β-sheets

• Parallel and anti-parallel β-sheets

Secondary Structures: β-sheets

Structural motifs

• Structural motifs (often referred to as super-secondary structures) consist of several secondary structure elements and loops.

• Examples:– Helix loop Helix: Consists of alpha helices bound by a looping stretch

of amino acids. Important in DNA binding proteins.– Beta Hairpin: Extremely common. Two anti-parallel beta strands

connected by a tight turn of a few amino acids between them.– Zinc Finger: Two beta strands with an alpha helix end folded over to

bind a zinc ion. This motif is seen in transcription factors.– Greek Key: 4 beta strands folded over into a sandwich shape.

HR

R

H

Peptide bonds • Geometry of peptide bonds

Peptide bonds

ω

• Definition of dihedral angels Φ, Ψ, and ω.

A dihedral angle is the angle of intersection of two planes. It is the measure of an angle having its vertex on the intersecting edge and one side in each of the planes. The sides of the angle are perpendicular to the intersecting edge.

Peptide bonds• Dihedral angles Φ and Ψ, the values that are possible are

constrained geometrically due to steric clashes between neighboring atoms.

Φ (deg)

Ψ (deg)

Peptide bonds• Ramachandran Plots: The permitted values of phi psi

Φ (deg)

Ψ (deg)

• Ramachandran Plots

• Primary Structure

• Secondary Structure

• Tertiary Structure

• Quaternary Structure

Principles of protein structure

The tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates

• Very large proteins (proteins with more than 10’000 residues are possible) are rarely forming one large compact structure, but are often structured in individual domains of ~200-500 residues.

• Domains: The definition of protein domains adopted here is that of compactly folded structures with their own hydrophobic core which may fold independently of the rest of the chain.

Tertiary Structure

Tertiary Structure: Domains

MAP Kinase ERK-2

Phosporylase kinase domain

Phospotransferase domain

• Primary Structure

• Secondary Structure

• Tertiary Structure

• Quaternary Structure

Principles of protein structure

• Arrangement of multiple folded protein molecules in a multi-subunit complex.• e.g.: human hemoglobin: 4 chains: α2β2

Quaternary Structure

Where do we find protein structures?

http://www.wwpdb.org/

http://www.pdb.org

http://www.ebi.ac.uk/pdbe/

http://www.pdbj.org

Growth of the Protein Data Bank PDB

[ PDB: http://www.pdb.org ]

TotalYearly

Growth of the Protein Data Bank PDB

Representation of Structural Information

• Representation of Structural Information – Atom types (chemical element and hybridization)– Atom coordinates– Atom charges (full or partial)– Topology (connectivity of atoms)– Chemical bond type– Chirality and Ambiguities– Trajectories– Surfaces and scalar fields (e.g. electrostatics)– Identification (IUPAC name, trivial names)– Experimental details (source of data)– Accuracy and reliability information– Annotation (cross references with other databases)

File formats and their limitations

• Representation of Structural Information – File formats:

• SMILES• MOL2 (Tripos Inc.)• SDF• PDB• mmCIF• PDBML

• http://www.pdb.org• File format is column based

• Sections:• Title• Primary Structure• Heterogen Section• Secondary Structure• Connectivity Annotation Section• Miscellaneous Features Section• Crystallographic and Coordinate Transformation Section• Coordinate Section• Connectivity Section

1 2 3 4 5 6 71234567890123456789012345678901234567890123456789012345678901234567890HEADER MUSCLE PROTEIN 02-JUN-93 1MYS

PDB file format

1 2 3 4 5 6 71234567890123456789012345678901234567890123456789012345678901234567890HEADER 3-EPIMERASE 01-DEC-98 1RPX TITLE D-RIBULOSE-5-PHOSPHATE 3-EPIMERASE FROM SOLANUM TUBEROSUM TITLE 2 CHLOROPLASTS COMPND MOL_ID: 1; COMPND 2 MOLECULE: PROTEIN (RIBULOSE-PHOSPHATE 3-EPIMERASE); COMPND 3 CHAIN: A, B, C; COMPND 4 EC: 5.1.3.1; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: SOLANUM TUBEROSUM; SOURCE 3 ORGANISM_COMMON: POTATO; SOURCE 4 ORGANISM_TAXID: 4113; SOURCE 5 ORGANELLE: CHLOROPLAST; SOURCE 6 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 7 EXPRESSION_SYSTEM_TAXID: 562 KEYWDS 3-EPIMERASE, CHLOROPLAST, CALVIN CYCLE, OXIDATIVE PENTOSE KEYWDS 2 PHOSPHATE PATHWAY EXPDTA X-RAY DIFFRACTION AUTHOR J.KOPP,G.E.SCHULZ REVDAT 4 24-FEB-09 1RPX 1 VERSN REVDAT 3 01-MAR-05 1RPX 1 DBREF REVDAT 2 01-APR-03 1RPX 1 JRNL REVDAT 1 07-APR-99 1RPX 0 JRNL AUTH J.KOPP,S.KOPRIVA,K.H.SUSS,G.E.SCHULZ JRNL TITL STRUCTURE AND MECHANISM OF THE AMPHIBOLIC ENZYME JRNL TITL 2 D-RIBULOSE-5-PHOSPHATE 3-EPIMERASE FROM POTATO JRNL TITL 3 CHLOROPLASTS. JRNL REF J.MOL.BIOL. V. 287 761 1999 JRNL REFN ISSN 0022-2836 JRNL PMID 10191144 JRNL DOI 10.1006/JMBI.1999.2643 REMARK 1 ....

HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, REMARK

PDB file format

1 2 3 4 5 6 71234567890123456789012345678901234567890123456789012345678901234567890...REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.30 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.8.5.1 REMARK 3 AUTHORS : BRUNGER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 2.3 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 35.0 REMARK 3 DATA CUTOFF (SIGMA(F)) : 0.0 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : 100000.0 REMARK 3 DATA CUTOFF LOW (ABS(F)) : 0.001 REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 97.2 REMARK 3 NUMBER OF REFLECTIONS : 49783 REMARK 3 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING SET) : 0.174 REMARK 3 FREE R VALUE : 0.212 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 3.01 REMARK 3 FREE R VALUE TEST SET COUNT : 1500 REMARK 3 ESTIMATED ERROR OF FREE R VALUE : 0.005...

HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, REMARK

PDB file format

1 2 3 4 5 6 71234567890123456789012345678901234567890123456789012345678901234567890...ATOM 74 N ASP A 10 12.982 78.264 31.707 1.00 48.50 N ATOM 75 CA ASP A 10 14.137 79.163 31.764 1.00 46.20 C ATOM 76 C ASP A 10 14.910 79.105 30.460 1.00 43.70 C ATOM 77 O ASP A 10 14.572 78.355 29.547 1.00 45.78 O ATOM 78 CB ASP A 10 15.133 78.752 32.855 1.00 49.64 C ATOM 79 CG ASP A 10 14.471 78.300 34.129 1.00 57.95 C ATOM 80 OD1 ASP A 10 13.809 79.129 34.788 1.00 57.91 O ATOM 81 OD2 ASP A 10 14.651 77.114 34.487 1.00 63.05 O...HETATM 5200 S SO4 231 30.451 80.354 18.252 1.00 51.91 S HETATM 5201 O1 SO4 231 30.153 81.805 18.105 1.00 57.57 O HETATM 5202 O2 SO4 231 31.895 80.187 18.738 1.00 54.06 O HETATM 5203 O3 SO4 231 29.512 79.607 19.287 1.00 46.19 O HETATM 5204 O4 SO4 231 30.193 79.714 16.846 1.00 50.16 O ...

MODEL, ATOM, SIGATM, ANISOU, SIGUIJ, TER, HETATM, ENDMDL

PDB file format

x,y,z atom coordinates

Representation of Structural Information

AlanineAlaA

ATOM 263 N ALA A 35 1.429 34.959 -16.825 1.00 35.48 N ATOM 264 CA ALA A 35 0.523 34.398 -17.829 1.00 35.10 C ATOM 265 C ALA A 35 -0.724 33.878 -17.157 1.00 33.88 C ATOM 266 O ALA A 35 -1.850 34.138 -17.600 1.00 33.13 O ATOM 267 CB ALA A 35 1.209 33.268 -18.594 1.00 33.84 C

References and further reading:

1. Thomas E. Creighton, “Proteins:

Structures and Molecular

Properties”.

2. Arthur M. Lesk, “Introduction to

Protein Architecture. The Structural

Biology of Proteins”

top related