software can be grouped into two general classes: protein based programs: ► calculate protein...

69
Software Can be Grouped into Two General Classes: • Protein Based Programs: Calculate Protein Structures XPLOR (NIH, CNS,CXS), CYANA , CHARMM, Sybyl, Amber, etc. Visualize Protein Structures Quanta, Insight II, VMD-XPLOR , RasMol , Chimera, MOLMOL , PyMOL , MolScript , Swis-PDBViewer , Jmol , etc Evaluate Protein Structures PROCHECK , MOLProbity , PROSA , WHATIF , Verify3D , iCING , PSVS , VADAR , etc • NMR Based Programs NMR data processing NMRPipe , ACD/NMR , Felix NMR data analysis/visualization NMRDraw , NMRViewJ , PIPP , SPARKY , XEASY , CCPN-NMR Iterative Relaxation Matrix Calculations IRMA , CORMA, MARDIGRAS , XPLOR , MORASS , etc Automated NMR Analysis AutoAssign , AutoStructure, ARIA, PINE, CANDID/UNIO,CS-ROSETTA, we-nmr, I-TASSER, etc Not A complete List of Software New software is constantly being developed In a practical, sense, only use a small subset of available software a lot of redundancy, use what trained on/comfortable with. No Real Standards different file formats, a lot of incompatibilities and file manipulations necessary. Software for Protein Structures by NMR

Upload: brenda-blair

Post on 17-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software Can be Grouped into Two General Classes:• Protein Based Programs:

► Calculate Protein Structures XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl, Amber, etc.

► Visualize Protein StructuresQuanta, Insight II, VMD-XPLOR, RasMol, Chimera, MOLMOL, PyMOL, MolScript, Swis-PDBViewer, Jmol, etc

► Evaluate Protein Structures PROCHECK, MOLProbity, PROSA, WHATIF, Verify3D, iCING, PSVS, VADAR, etc

• NMR Based Programs► NMR data processing

NMRPipe, ACD/NMR, Felix► NMR data analysis/visualization

NMRDraw, NMRViewJ, PIPP, SPARKY, XEASY, CCPN-NMR► Iterative Relaxation Matrix Calculations

IRMA, CORMA, MARDIGRAS, XPLOR, MORASS, etc► Automated NMR Analysis

AutoAssign, AutoStructure, ARIA, PINE, CANDID/UNIO,CS-ROSETTA, we-nmr, I-TASSER, etc

Not A complete List of Software► New software is constantly being developed► In a practical, sense, only use a small subset of available software

a lot of redundancy, use what trained on/comfortable with. No Real Standards

► different file formats, a lot of incompatibilities and file manipulations necessary.

Software for Protein Structures by NMR

Page 2: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein NMR Based Software Programs:• There are multiple programs that have similar functions.• Not practical or necessary to discuss all the variety of programs that are available.• Applications will be discussed in general with specific references to a limited number of programs.

Protein Based Programs: Visualize Protein Structures• How is the protein structure stored?

► No uniform format. Protein Data Bank (PDB) is the closest thing to a uniformed format Most programs can read and/or write PDB file formats

► Just about every program has its own proprietary format Babel program can interconvert ~47 different structure formats

• Common Information in a protein structure:► atoms, residues, chains► X, Y, Z coordinates

Page 3: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Visualize Protein Structures• Protein Data Bank (PDB) format:

► Header: Unique PDB IdentifierProtein Name Submission Date

Descriptive Title of Structure

All Compounds Present

Source of Sample

Authors

Publication Information

HEADER DNA BINDING PROTEIN 08-SEP-01 1JXS TITLE SOLUTION STRUCTURE OF THE DNA-BINDING DOMAIN OF INTERLEUKIN TITLE 2 ENHANCER BINDING FACTOR COMPND MOL_ID: 1; COMPND 2 MOLECULE: INTERLEUKIN ENHANCER BINDING FACTOR; COMPND 3 CHAIN: A; COMPND 4 FRAGMENT: DNA-BINDING DOMAIN; COMPND 5 SYNONYM: ILF-1; COMPND 6 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 GENE: ILF-1; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 6 EXPRESSION_SYSTEM_COMMON: BACTERIA; SOURCE 7 EXPRESSION_SYSTEM_STRAIN: BL21; SOURCE 8 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID; SOURCE 9 EXPRESSION_SYSTEM_PLASMID: PET21A KEYWDS DNA-BINDING DOMAIN, WINGED HELIX EXPDTA NMR, 20 STRUCTURES AUTHOR W.J.CHUANG,P.P.LIU,C.LI,Y.H.HSIEH,S.W.CHEN,S.H.CHEN,W.Y.JENG REVDAT 1 11-MAR-03 1JXS 0 JRNL AUTH P.P.LIU,Y.C.CHEN,C.LI,Y.H.HSIEH,S.W.CHEN,S.H.CHEN, JRNL AUTH 2 W.Y.JENG,W.J.CHUANG JRNL TITL SOLUTION STRUCTURE OF THE DNA-BINDING DOMAIN OF JRNL TITL 2 INTERLEUKIN ENHANCER BINDING FACTOR 1 (FOXK1A) JRNL REF PROTEINS: V. 49 543 2002 JRNL REF 2 STRUCT.,FUNCT.,GENET.

Page 4: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Visualize Protein Structures• Protein Data Bank (PDB) format:

► Header:

Description of Experimental Data

.

.

.

REMARK 210 EXPERIMENTAL DETAILS REMARK 210 EXPERIMENT TYPE : NMR REMARK 210 TEMPERATURE (KELVIN) : 300; 300; 300; 300 REMARK 210 PH : 6; 6; 6; 6 REMARK 210 IONIC STRENGTH : 125; 125; 125; 125 REMARK 210 PRESSURE : AMBIENT; AMBIENT; AMBIENT; REMARK 210 AMBIENT REMARK 210 SAMPLE CONTENTS : 3MM ILF, 25MM PHOSPHATE REMARK 210 BUFFER, 100MM NACL; 3MM ILF, REMARK 210 25MM PHOSPHATE BUFFER, 100MM REMARK 210 NACL; 3MM ILF U-15N, 25MM REMARK 210 PHOSPHATE BUFFER, 100MM NACL; REMARK 210 2MM ILF U-15N, 13C, 25MM REMARK 210 PHOSPHATE BUFFER, 100MM NACL REMARK 210 REMARK 210 NMR EXPERIMENTS CONDUCTED : NOESY, DQF-COSY, TOCSY, 3D_ REMARK 210 15N-SEPARATED_NOESY, 3D_13C- REMARK 210 SEPARATED_NOESY REMARK 210 SPECTROMETER FIELD STRENGTH : 600 MHZ, 500 MHZ REMARK 210 SPECTROMETER MODEL : AVANCE, DMX REMARK 210 SPECTROMETER MANUFACTURER : BRUKER REMARK 210 REMARK 210 STRUCTURE DETERMINATION. REMARK 210 SOFTWARE USED : AURELIA 2.7.10, XWINNMR 2.6 REMARK 210 METHOD USED : HYBRID DISTANCE GEOMETRY- REMARK 210 HBHA(CBCACO)NH

Page 5: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Visualize Protein Structures• Protein Data Bank (PDB) format:

► Header:

.

.

.

Reference to Data in other Databases

REMARK 900 RELATED ENTRIES REMARK 900 RELATED ID: 4829 RELATED DB: BMRB REMARK 900 1H, 15N AND 13C RESONANCE ASSIGNMENTS FOR THE DNA-BINDING REMARK 900 DOMAIN OF INTERLEUKIN ENHANCER BINDING FACTOR DBREF 1JXS A 1 98 SWS Q01167 ILF1_HUMAN 251 348 SEQRES 1 A 98 ASP SER LYS PRO PRO TYR SER TYR ALA GLN LEU ILE VAL SEQRES 2 A 98 GLN ALA ILE THR MET ALA PRO ASP LYS GLN LEU THR LEU SEQRES 3 A 98 ASN GLY ILE TYR THR HIS ILE THR LYS ASN TYR PRO TYR SEQRES 4 A 98 TYR ARG THR ALA ASP LYS GLY TRP GLN ASN SER ILE ARG SEQRES 5 A 98 HIS ASN LEU SER LEU ASN ARG TYR PHE ILE LYS VAL PRO SEQRES 6 A 98 ARG SER GLN GLU GLU PRO GLY LYS GLY SER PHE TRP ARG SEQRES 7 A 98 ILE ASP PRO ALA SER GLU SER LYS LEU ILE GLU GLN ALA SEQRES 8 A 98 PHE ARG LYS ARG ARG PRO ARG HELIX 1 1 ALA A 9 MET A 18 1 10 HELIX 2 2 THR A 25 TYR A 37 1 13 HELIX 3 3 TRP A 47 ASN A 58 1 12 HELIX 4 4 SER A 83 ARG A 93 1 11 SHEET 1 A 3 GLN A 23 LEU A 24 0 SHEET 2 A 3 PHE A 76 ILE A 79 -1 O TRP A 77 N LEU A 24 SHEET 3 A 3 PHE A 61 VAL A 64 -1 N VAL A 64 O PHE A 76 CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 1.000000 0.000000 0.000000 0.00000

Protein Sequence

Observed Secondary Structure Elements

Meaningless symmetry data(consistency with X-ray structures)

Page 6: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Visualize Protein Structures• Protein Data Bank (PDB) format:

► Coordinates:

Atom No.

Atom TypeResidue Type

Chain(structures composed of multiple proteins will have a different chain for each protein)

Residue No.

X, Y, Z coordinates

Atom Identifier

Model Number(NMR structures typicallyWill have multiple models

in a single PDB file

Identifier(4 characters)

Occupancy Temperature Factor

.

.

.

Page 7: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Visualize Protein Structures• Protein Data Bank (PDB) format:

► Coordinates:

► Other Features

.

.

.

End of Model

End of File

HETATM Identifier(non-protein atoms

Small molecules, ions, solvent, water etc)

Define Specific Atom Connectivity

N-Terminal NH(NH3 instead of NH)

C-Terminal O(sometimes OXT1 & OXT2)

Page 8: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Protein Based Programs: Visualize Protein Structures• Protein Data Bank (PDB) format:

► Coordinates Are internally consistent: i.e. the X,Y,Z coordinates of atom A is the appropriate bond distance away from the X,Y,Z coordinates of atom B. The coordinates on an absolute scale are arbitrary: i.e. there is no defined relationship between the coordinates of protein A and protein B, even if protein A and protein B are multiple copies of the same protein.

► Alignment Issue Proteins need to be aligned for any structural comparison

– After alignment, can visually compare relative orientation/position of secondary structures, active-sites, bound ligands, position of side-chains, etc–After alignment, relative distance comparisons have meaning i.e. if 2 helix do not overlap perfectly a measured displacement of the helices is relevant

Alignment requires both rotational and translational transformation of one coordinate axis relative to the other.

– one protein is remained fixed and the other protein(s) are aligned to it

Protein A

Software for Protein Structures by NMR

X

Y

Z Protein B

Protein A

Relative position of the 2 proteins in the X,Y,Z coordinate system is arbitrary.

X

Y

Z

Protein B

The 2 proteins are now centered in the same coordinate frame.

Align

Page 9: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Visualize Protein Structures

• Different Ways to Visualize the Same Protein Structure► Lines/Sticks

Connect each atom coordinate position by a straight line– Bond colored by atom type where ½ of bond corresponds to atom 1 and the other ½ to atom 2

Accurate representation of atom position– Poor representation of protein packing

Crowded– Reduce complexity by only displaying backbone or specific regions– Reduce complexity by zooming in on particular region

Page 10: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Visualize Protein Structures

• Different Ways to Visualize the Same Protein Structure► Ball+Stick

Connect each atom coordinate position by a straight line– Display each atom as a sphere

Accurate representation of atom position– poor representation of protein packing

Crowded– Reduce complexity by only displaying backbone or specific regions– Reduce complexity by zooming in on particular region

Page 11: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Visualize Protein Structures

• Different Ways to Visualize the Same Protein Structure► Ribbons/Cartoon

Connect each C atom coordinate position by a graphical representation Smooth-Fit of C positions

– Not accurate representation of atom coordinates– Reduces Complexity of View No Side-chains, usually only backbone

Highlights secondary structure– -strands typically shown as arrow pointing in direction of C-terminus– -helix shown as a thick helical coil– random coil regions shown as tube

Highlights Overall fold and topology Easy Comparison of Fold Families

Page 12: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Visualize Protein Structures• Different Ways to Visualize the Same Protein Structure

► Space Filling/van der Waals Each atom position represented by a sphere

– diameter of sphere is equal to van der Waals radius– very accurate representation of protein

Highlights surface structure– identify binding pockets– can not visualize interior of protein without slicing through structure

Highlights packing verify absence of “holes” in structure verify tight packing of different domains, small molecule in binding pocket, etc

Colored coded by domain

Space Filling emphasizes hole or channel in protein

van der Waals radii (in Å)H C N O F P S Cl

1.0 1.7 1.6 1.5 1.35 1.9 1.80 1.8

Page 13: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Visualize Protein Structures• Different Ways to Visualize the Same Protein Structure

► GRASP Generates a smooth topology or shape of the protein’s surface Highlights detailed surface structure

– identify binding pockets– can not visualize interior of protein without slicing through structure

Can Map properties of the protein onto the surface electrostatic NMR chemical shift changes NMR Dynamics & X-ray B-factors Conserved Residues from Sequence Alignment

GRASP surface of acetyl choline esterase complexed with acetyl choline colored by potential (red negative, blue positive)

GRASP surface of MMP-1 displaying NMR chemical shift changes upon binding an inhibitor

Page 14: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Evaluate Protein Structures• Compare to known Structures• All Structures have Problems or Errors as determined by software analysis

► The challenge is to determine which, if any, errors are serious misinterpretation of the data and require correcting.► Three general rules of thumb

If the error is sever, far outside the norm, it is probably a mistake. If errors cluster together, there is almost certainly a mistake. If the structure has an odd conformation:

– knot, large holes, -helix, + for non-Gly, etc.

Remember: The comparison is made against typical structures, your “error” may simply represent a novel fold or conformation that has not been seen.

Let the Data Determine the Structure

Page 15: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Compares bond lengths and bond angles to database of standard small molecule values

Page 16: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► PROCHECK correct , distribution most residues should fall in the most favored region of Ramachandran plot

Colored contours indicate allowed regions of the Ramachandran plot

Red contours indicate preferred region of the Ramachandran plot

Page 17: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► PROCHECK correct , distribution as a function of residue type most residues should fall in the preferred region of the Ramachandran plots

Dark contours are preferred regions

Page 18: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► PROCHECK comparison of main chain parameters to standard values of comparable X-ray structures consistent or better results with a comparable resolution structure implies a reliable structure

Band indicates range of values observed as a function X-ray resolution

Value observed for structure at specified resolution. Inside band indicates it is consistent with other similar resolution structures

Boxed Plot is Boxed Plot is Overall G-factor Overall G-factor or Structure or Structure Quality ScoreQuality Score

Page 19: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► PROCHECK comparison of side chain parameters to standard values of comparable X-ray structures consistent or better results with a comparable resolution structure implies a reliable structure

Band indicates range of values observed as a function X-ray resolution

Value observed for structure at specified resolution. Inside band indicates it is consistent with other similar resolution structures

Page 20: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Evaluate Protein Structures

► PROCHECK Complete list of structure violations Per residue plot of main chain and side-chain parameters Number of plots of statically summaries of parameters

Page 21: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► MOLPROBITY Provides a variety of protein structure checks by comparison to standard values in PDB

Some overlap with Procheck Some unique checks including clashes and structure visualization

All-AtomContacts

Clashscore, all atoms: 16.53 43rd percentile* (N=1784, 0Å - 9999Å)

Clashscore is the number of serious steric overlaps (> 0.4 Å) per 1000 atoms.

ProteinGeometry

Rotamer outliers: 4.48% Goal: <1%

Ramachandran outliers: 2.56% Goal: <0.2%

Ramachandran favored: 94.87% Goal: >98%

Cβ deviations >0.25Å: 0 Goal: 0

MolProbity score: 2.57 43rd percentile* (N=27675, 0Å - 99Å)

Residues with bad bonds: 0.00% Goal: <1%

Residues with bad angles: 0.00% Goal: <0.5%

* 100th percentile is the best among structures of comparable resolution; 0 th percentile is the worst.

Page 22: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► MOLPROBITY Multi-criterion chart

per residue analysis of all problems

Page 23: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Protein Based Programs: Evaluate Protein Structures► MOLPROBITY

Multi-criterion kinemage view all problems

Software for Protein Structures by NMR

Bad rotamer

Bad backboneconformation

Bad clash

Choose what to display

Page 24: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Clash ListAtom Pair Distance

Page 25: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► MOLPROBITY Single-criterion files

Page 26: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Page 27: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Protein Based Programs: Evaluate Protein Structures► Verify3D

Compares the primary sequence against the protein’s 3D structure Compares each residues position to statistical distribution of the 20 amino acids against defined structural environments. based on the total area buried and fraction of side-chain area covered by polar atoms

EP2 B3

P1 B2

B1

0.80

0.40

0.00

Fra

ctio

n P

olar

0 1208040Area Buried (Å2)

Total 3D-1D score =

ij Pi

jiPN

:ln

Software for Protein Structures by NMR

Structure Environments

Page 28: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Env. Class

W F Y L I V M A G P C T S Q N E D H K R

B1 1.00 1.32 0.18 1.27 1.17 0.66 1.26 -0.66 -2.53 -1.16 -0.73 -1.29 -2.73 -1.08 -1.93 -1.74 -1.97 -0.34 -1.82 -1.67

B1 1.17 0.85 0.07 1.13 1.47 1.09 0.55 -0.79 -2.02 -0.94 -0.22 -1.12 -2.91 -1.67 -1.42 -1.93 -2.56 -1.91 -2.69 -1.16

B1 1.05 1.45 0.17 1.10 1.11 1.02 0.98 -0.91 -1.92 0.26 -1.22 -1.53 -2.81 -1.17 -2.42 -2.52 -1.76 -1.12 -2.59 -2.16

B2 0.50 0.90 0.85 1.01 0.63 0.68 1.12 -0.69 -1.49 -2.21 -0.10 -1.50 -1.47 -0.23 -0.61 -0.71 -1.62 0.23 -0.78 0.06

B2 0.01 1.18 1.06 0.76 1.31 1.06 0.64 -1.55 -2.26 -0.49 -0.87 -2.27 -1.77 -1.22 -2.07 -1.07 -1.41 -0.77 -1.14 -0.20

B2 1.02 1.05 1.12 0.84 0.81 0.60 0.90 -0.66 -1.66 0.19 -0.05 -0.76 -1.17 -0.76 -0.66 -1.35 -1.28 0.46 -2.34 -0.80

B3 0.92 -0.03 0.58 0.15 0.04 -0.02 0.89 -0.57 -1.86 -0.68 -1.56 -0.57 -0.96 0.22 -0.06 0.08 -0.50 0.73 0.43 0.96

B3 0.75 0.81 1.30 0.18 0.54 0.56 -0.57 -0.93 -1.93 -0.34 -0.54 -0.44 -0.74 0.21 -0.24 -0.14 -0.86 0.82 -0.53 0.13

B3 1.07 0.70 1.13 0.35 -0.17 -0.03 0.23 -0.96 -0.98 -0.13 -1.20 -0.53 -0.54 0.05 0.04 -0.36 -1.05 1.01 0.10 0.66

P1 -1.35 -0.82 -0.59 -0.52 -0.24 0.10 -0.03 0.73 -0.49 -0.25 0.95 0.31 0.34 -0.14 -0.54 -0.17 -0.25 -0.52 -0.21 -0.28

P1 0.36 -0.49 0.17 -1.03 0.20 0.46 -0.27 0.64 -0.82 -0.55 1.49 0.93 0.33 -2.27 -1.32 -0.73 -1.07 -0.42 -1.21 -0.77

P1 -1.26 -1.20 -1.31 -0.62 -0.23 -0.01 -1.19 0.46 -0.24 0.66 1.35 0.56 0.49 -0.63 -0.13 -0.61 0.38 -1.12 -0.74 -1.29

P2 -1.14 -1.43 -0.79 -0.35 -0.54 -0.48 -0.45 0.06 -0.50 -0.26 -0.93 -0.05 -0.18 0.55 -0.05 0.56 0.28 0.06 0.61 0.50

P2 -0.79 -0.54 -0.84 -1.30 -0.33 0.13 -0.72 -0.55 -0.98 -1.29 -0.57 0.84 0.59 -0.08 -0.16 0.32 0.19 -0.87 0.59 0.10

P2 -0.82 -0.86 -0.51 -0.70 -1.09 -0.88 -0.89 -0.15 -0.40 0.44 -0.60 0.06 0.26 0.27 0.50 0.27 0.49 0.13 0.44 0.30

E -1.35 -2.20 -2.10 -1.58 -2.76 -1.10 -0.72 0.46 0.68 0.04 -0.44 -0.17 0.15 0.36 0.28 0.59 0.44 -0.19 0.13 -0.34

E 0.64 -0.90 0.30 -1.66 -1.47 -1.74 -0.68 0.06 1.46 -0.96 -0.24 0.14 0.65 -0.19 -0.06 -0.16 -0.78 -0.83 -0.52 -0.49

E -2.14 -1.90 -0.94 -1.19 -1.61 -0.91 -1.67 0.12 1.13 0.20 -0.46 0.12 0.32 -0.03 0.41 0.03 0.22 -0.25 -0.14 -0.32

3D-1D Scoring TableBuried Hydrophobic

EnvironmentExposed Hydrophilic

Environment

Page 29: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Evaluate Protein Structures► Verify3D

Example scoring function on a per residue basis

Actual X-ray structure

Incorrect modeled structure

Page 30: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein analyzed

Z = Xi – X

Page 31: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Reliable Structure (no strain energy) Visualize the per residue energy on the structure (identify problematic regions)

Page 32: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Protein Based Programs: Evaluate Protein Structures► WHATIF/WHATCHECK

Provides a variety of protein structure checks by comparison to standard values in PDB

Some overlap with Procheck Some unique checks including packing parameters

Software for Protein Structures by NMR

Unique to WHATIF/WHATCHECK Check for buried unsatisfied h-bond donors and acceptors Peptide bond flip check Check for amino-acid handedness HIS GLN ASN side chain conformation check Check for atom nomenclature Side chain planarity check Verification of Proline puckering New Directional atomic contact analysis Directional atomic contact analysis

Particular to X-ray Structures Check for isolated water clusters Atomic occupancy check Symmetry check Chain Name Validation

Similar to Procheck Verification of bond lengths Check for bumps (bad contacts) Amino-acid side chain rotamer analysis Torsion angle evaluation

Page 33: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Warning: Low packing Z-score for some residuesThe residues listed in the table below have an unusual packing environment according to the 2nd generation quality check. The score listed in the table is a packing normality Z-score: positive means better than average, negative means worse than average. Only residues scoring less than -2.50 are listed here. These are the "unusual" residues in the structure, so it will be interesting to take a special look at them. 137 LYS ( 10 ) B -3.43 136 LYS ( 9 ) B -3.11 30 GLN ( 40 ) A -3.08 218 GLU ( 91 ) B -2.84 158 VAL ( 31 ) B -2.83 240 LYS ( 113 ) B -2.59 231 GLU ( 104 ) B -2.52 Warning: Abnormal packing Z-score for sequential residuesA stretch of at least four sequential residues with a 2nd generation packing Z-score below -1.75 was found. This could indicate that these residues are part of a strange loop or that the residues in this range are incomplete, but it might also be an indication of mis-threading. The table below lists the first and last residue in each stretch found, as well as the average residue Z-score of the series. 134 ASN ( 7 ) B --- 137 LYS ( 10 ) B -2.65 Warning: Structural average packing Z-score a bit worrisomeThe structural 2nd generation average quality control value is a bit low. The protein is probably threaded correctly, but either poorly refined, or it is just a protein with an unusual (but correct) structure. The average quality of properly refined X-ray structures is 0.0+/-1.0. All contacts : Average = -0.589 Z-score = -3.74 BB-BB contacts : Average = -0.178 Z-score = -1.27 BB-SC contacts : Average = -0.574 Z-score = -3.07 SC-BB contacts : Average = -0.240 Z-score = -1.29 SC-SC contacts : Average = -0.563 Z-score = -2.79

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► WHATIF/WHATCHECH

Protein Packing Report:

Page 34: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

WHATIF/WHATCHECH Packing Score► For each "fixed fragment" in a protein structure (any "largest group" of atoms that does not contain a torsion angle):

the occurrence of all possible atom types in all possible positions around the fixed fragment is counted. If a certain configuration occurs very frequently, it is assumed to be a preferred configuration. All preference counts for all atoms around a residue are used to calculate a summary score for each residue.

► Quality control score for each residue is a Z-score

Describes how well this residue feels compared to other similar residues in well refined structures. If the residue Z-score is negative, it feels less at home than the "average" residue. If the Z-score is positive, it feels more at home than average. The individual scores are not very powerful.

– A lot of structures have a few low-scoring residues. More useful are:

– list of sequential residues that all have low scores (possibly indicating a mis-threaded segment),– overall quality control Z-score

► Impact on modeling by homology:

Severe. If a structure has a bad quality control Z-score, it can not be trusted.

► Impact for NMR and crystallographer:

Global quality control value should only be low for a really misthreaded or improperly folded structure. Individual residues listed are not really rare. The most interesting is the "residues in sequence"

– if that table shows any entries, have a look whether there is an alternative for the conformation of that "loop".

Page 35: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Evaluate Protein Structures

► WHATIF/WHATCHECH

9 GLY ( 19 ) A N 11 TYR ( 21 ) A N 15 ILE ( 25 ) A O 29 ASP ( 39 ) A O 30 GLN ( 40 ) A O 31 HIS ( 41 ) A ND1 32 ILE ( 42 ) A N33 GLN ( 43 ) A N 39 GLU ( 49 ) A O 48 SER ( 58 ) A O 60 ASP ( 70 ) A N 62 LEU ( 72 ) A N 74 LEU ( 84 ) A N 81 GLU ( 91 ) A O 84 TYR ( 94 ) A N 92 HIS ( 102 ) A NE2 101 LEU ( 111 ) A O

Buried hydrogen bond donors and acceptors are not involved in a hydrogen bond

.

.

.

45 TYR ( 55) A CZ -- 74 LEU ( 84) A CD1 0.479 2.721 INTRA 78 ARG ( 88) A CD -- 86 THR ( 96) A CG2 0.391 2.809 INTRA 109 LEU ( 119) A O -- 110 GLY ( 120) A C 0.375 2.425 INTRA 110 GLY ( 120) A N -- 111 PRO ( 121) A CD 0.365 2.635 INTRA 131 PRO ( 4) B O -- 133 GLY ( 6) B N 0.358 2.192 INTRA BF 39 GLU ( 49) A O -- 40 SER ( 50) A CB 0.349 2.451 INTRA 109 LEU ( 119) A C -- 111 PRO ( 121) A CD 0.340 2.860 INTRA 163 ASP ( 36) B O -- 165 SER ( 38) B N 0.328 2.372 INTRA 114 HIS ( 124) A O -- 115 PHE ( 125) A C 0.328 2.472 INTRA 165 SER ( 38) B O -- 166 ASP ( 39) B C 0.303 2.497 INTRA 98 PHE ( 108) A CB -- 120 ILE ( 130) A CG1 0.297 2.903 INTRA 132 LEU ( 5) B O -- 133 GLY ( 6) B C 0.296 2.504 INTRA BF246 LEU ( 119) B O -- 247 GLY ( 120) B C 0.295 2.505 INTRA 113 THR ( 123) A CB -- 120 ILE ( 130) A CD1 0.286 2.914 INTRA 131 PRO ( 4) B O -- 132 LEU ( 5) B C 0.282 2.518 INTRA BF151 ARG ( 24) B NH1 -- 153 LEU ( 26) B CD2 0.278 2.822 INTRA 81 GLU ( 91) A C -- 83 GLY ( 93) A N 0.277 2.623 INTRA 96 HIS ( 106) A CD2 -- 216 LEU ( 89) B CD2 0.255 2.945 INTRA

The pairs of atoms listed have an unusually short distance.

.

.

.

Page 36: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► Comparison of XPLOR and CYANA XPLOR

Also known as XPLOR-NIH, CNS and CNX Calculates structures using Cartesian coordinates

Uses a modified PDB file format Optimizes

Number of specific “Target Functions” to refine protein structure 1H -1H distance (NOEs) Chemical shifts (both 13C & 1H) Coupling constants (3JNHC)Ramachandran database Empirical Backbone-Backbone Hydrogen-Bonding Potential Radius of Gyration Residual Dipolar Coupling Constants

CYANA Dynamics geometry Algorithm for NMR Applications Calculates structures using Torsional Space

Bond lengths and bond angles are kept fixed only torsion angles are allowed to change

Advantages over XPLOR Faster Higher structure conversion rate (~30% for XPLOR)

Disadvantages compared to XPLOR lacks additional target functions lower quality structures artificially sets all parameters except torsion angles to ideal values

Page 37: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations► First Step is Determining a Molecular Structure File for Your Specific Protein Sequence

Molecular Structure File (PSF) Contains all the information to describe the connectivity of the protein

Contains atom/residue information (names, types, charges masses, etc.) Contains structure terms (bond, angle, dihedral, improper, etc.) Does not contain atomic coordinates!

Information is obtained from two standard databases Topallhdg_new.pro

- connectivity information for each amino acid

- need to define topology for ALL non-amino acids Parallhdg_new.pro

- defines expected values for bond lengths, bond angles, etc

PSF patches define disulphide bonds define cis peptide bonds

PSF file is required for ALL XPLOR calculations PSF file must match exactly all the information in the structure or coordinate file (PDB file). Makes comparison of related, but not identical protein structures very challenging.

Page 38: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR An Example:

You want to compare your NMR structure with an X-ray structure you obtained from the PDB: X-ray structure:

- does not contain hydrogens.- There is a loop that doesn’t have coordinates (no electron density) - The structure contains a number of water molecules and detergent molecules- Identifiers are 1PDB, WAT, DET

NMR structure: - has a His-tag at the C-terminus (aid in purification)

- has three additional residues at the N-terminus (artifact of the cloning process)- the residue numbering start at 1 instead of 185 in the X-ray structure- Identifier is the atom type (C,H,N,O)

Your PSF file is consistent with your NMR structure, so XPLOR will give numerous errors when you try to read both the NMR and X-ray coordinate files. What are your options?

1) Make the X-ray coordinate file exactly match the NMR coordinate file:- add hydrogens- add dummy coordinates for the missing loop region- remove all the water molecules and detergent molecules- change identifier

2) Make the NMR coordinate file exactly match the X-ray coordinate file and create a new PSF file consistent with the X-ray structure:

- remove hydrogens and extra residues not present in X-ray structure- re-number the residues and atoms - change identifier

Page 39: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure calculations

Topallhdg_new.pro

Software for Protein Structures by NMR mass H 1.008mass C 12.011mass N 14.007mass O 15.999

residue ALA group atom N type=NH1 charge=-0.36 end atom HN type=H charge= 0.26 end group atom CA type=CT charge= 0.00 end atom HA type=HA charge= 0.10 end group atom CB type=CT charge=-0.30 end atom HB1 type=HA charge= 0.10 end atom HB2 type=HA charge= 0.10 end atom HB3 type=HA charge= 0.10 end group atom C type=C charge= 0.48 end atom O type=O charge=-0.48 end

bond N HN bond N CA bond CA HA bond CA CB bond CB HB1 bond CB HB2 bond CB HB3 bond CA C bond C O

improper HA N C CB !stereo CA improper HB1 HB2 CA HB3 !stereo CBend

Defines and groups all atoms, assigns a type and charge

Defines pairs of bonded atoms

Defines a group of four atoms comprising an improper torsion angle

Partial list of atomic masses

Page 40: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Atom Types: all atoms that have the same structural properties i.e. same bond lengths, bond angles, dihedrals are classified to the same atom type. Simplifies the assignment of structural parameters while keeping unique atom identifiers.

Improper: Artificial dihedral definition used primarily to maintain planer arrangement of atoms or proper stereochemistry in the structure (peptide bond, aromatic rings, etc). Does not follow the linear connectivity of a “proper” dihedral angles.

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations

Topallhdg_new.pro

The bond lengths and bond angles for CA-HA, CB-HB1, CB-HB2, and CB-HB3 are identical. So, all defined as CT-HA

Atoms defined by an improper angle to maintain proper sterochemistry are boxed. Usually set to either 0o or 180o

Page 41: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

bonds H NA $kbon 0.98bond CT CT $kbon 1.53

angle HA CT C $kang 109.5 angle CA CA CT $kang 120.0

improper H X X C $kpla 0 0.0 improper C X X C $kpla 0 0.0

dihedral CA CA CT CT $kdih 3 0.0 dihedral NA CC CT CT $kdih 3 0.0

NONbonded C 0.0903 3.2072 0.0903 3.2072 NONBonded CA 0.120 3.2072 0.120 3.2072

nbfix H O 44.2 1.0 44.2 1.0 nbfix H OC 44.2 1.0 44.2 1.0

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Parallhdg_new.pro

.

.

.

.

.

.

.

.

.

.

List all possible combinations of bonds, angles, impropers and dihedral with ideal values, force constants and multiplicity.

Ideal ValueForce Constant

Parameterization of van der Waals equation for atom-atom contact.

Parameterization of hydrogen-bond interactions.

multiplicity

Page 42: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Parallhdg_new.pro Defining atomic parameters is a very active area of molecular modeling research The values in the parameter database come from multiple sources:

X-ray database of high-resolution small molecules ab initio calculations experimental observations, IR, Raman, water-ion neutron and X-ray diffraction data, free energy of solvation data, etc

Page 43: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Protein Structures from an NMR PerspectiveDistribution of Bond Distances in Protein Hydrogen Bonds

Page 44: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

XPLOR PSF Scriptremarks build psf filertf @/PROGRAMS/xplor-nih-2.9.1/toppar/topallhdg_new.pro ENDparameter @/PROGRAMS/xplor-nih-2.9.1/toppar/parallhdg_new.pro ENDsegment name=" " SETUP=TRUE chain LINK PEPP HEAD - * TAIL + PRO END {LINK to PRO } LINK PEPT HEAD - * TAIL + * END FIRSt PROP TAIL + PRO END FIRSt NTER TAIL + * END LAST CTER HEAD - * END sequenceMETTHRLEULYS

HISHISHIS end endendwrite psf output=PROTEIN.psf endstop

.

.

Read parameter and topology files:

Initiate a segment. Repeat for each individual chain or component of the structure:

Definitions in the topology file on how to make a peptide bond and cap the N-terminus and C-terminus :

Complete protein sequence:

Write out the PSF file with name PROTEIN.psf:

Page 45: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

XPLOR PSF Script PATCHES HIS

HISHIS end endendpatch CISP

reference="-"=(residue 109) reference="+"=(residue 110) end

patch DISU reference=1=(residue 29)reference=2=(residue 57)

end

patch ltodreference=nil=(resid 8)

end

write psf output=PROTEIN.psf endstop

Create a cis peptide bond between residues 109 (P) and 110:

Create a disulphide bond between residues 29 and 57:

Convert residue 8 to a D-amino acid

Page 46: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

XPLOR PSF Script Using Structures and Multiple Segments

rtf @/PROGRAMS/xplor-nih-2.9.1/toppar/[email protected] @/PROGRAMS/xplor-nih-2.9.1/toppar/[email protected] ENDsegment name=“PROT" SETUP=TRUE chain LINK PEPP HEAD - * TAIL + PRO END {LINK to PRO } LINK PEPT HEAD - * TAIL + * ENDcoordinates @PROTEIN.pdb end endendsegment name=“MOLE " SETUP=TRUE CHAIN sequenceCPD end endendwrite psf output=PROTEIN.psf endstop

Read in your parameter and topology files defining molecule:

Instead of listing sequence, read in PDB file:

Define segment “MOLE” that contains a single copy of molecule(note: no LINK used) :

Page 47: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Typical extended structure created by XPLOR based on a PSF file

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations► Second Step is to create a linear extended structure of the protein sequence using idealized geometry

Extended structure coordinate File (EXT) Standard XPLOR PDB coordinate file Starting point to generate a proper fold for the protein from experimental data

Page 48: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations► Third Step is to convert NMR experimental data into XPLOR format

Distance Constraints a file (noe.tbl) containing a list of all observed/assigned NOE distant constraints

assign ( resid 3 and name HB# ) ( resid 49 and name HD# ) 4.0 2.2 3.0

XPLOR assign statement

Residue number and atom name for each atom involved in the distance constraint

Distance information

Understanding the distance information (a b c):- a distance constraint is typically defined with a range as opposed to an absolute number.

■ an upper and lower bound- in XPLOR format

■ upper bound = a + c in our example: upper bound = 4.0Å + 3.0Å = 7.0Å

■ lower bound = a - b in our example: lower bound = 4.0Å – 2.2Å = 1.8Å

a b c

Page 49: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Distance Constraints

Pseudo-Atoms/Wildcards

assign ( resid 3 and name HB# ) ( resid 49 and name HD# ) 4.0 2.2 3.0

What atom is HB# or HD#?:- Recall the PDB atom nomenclature

■ each atom gets a unique atom identifier but

■ each atom does not have a unique NMR resonance ■ a distance constraint to Ala methyl needs to go to

HB1, HB2 and HB3.- XPLOR represents these equivalent atoms with a single pseudo atom that is positioned equidistant between them

■ in the assign statement the equivalent atoms are represented with a wildcard (# or *)

# - represents 1 character i.e. HB# HB1 & HB2

* - represents 2 charactersi.e. HD*

■ distance constraint is to the pseudo-atom

HD11,HD12,HD13 & HD21,HD22,HD232 Leu methyls

C

CA CB

N

O

HN

HB1

HB3

HB2

HA

Pseudo-atom (HB#)

Page 50: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Distance Constraints

Pseudo-Atoms/Wildcards

Why Not Just Use Multiple Assign Statements?:- For a distance constraint between two sets of Leu methyls

there would be 36 possible combinations!- Multiple constraints between the same sets of atoms would

bias or overemphasize that distance constraints relative to others■ Each constraint would contribute independently to a

violation energy that XPLOR attempts to minimize. ■ Each duplication of a constraint that is violated would increase the likelihood that that constraint

would be satisfied at the expense of other constraints ■ Tipping the balance of energy to favor one constraint

- All the hydrogens may not be simultaneously satisfied for any given conformation.

■ XPLOR will try to satisfy all the constraints leading to a distorted structure.

assign ( resid 14 and name HD* ) ( resid 97 and name HD* ) 4.0 2.2 5.8

C

CA CB

N

O

HN

HB1

HB3

HB2

HA

Pseudo-atom (HB#)

Page 51: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Distance Constraints

Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD* ) ( resid 97 and name HD* ) 4.0 2.2 5.8

What Not Just Choose One Hydrogen to Represent the Set?:- Which one do you choose?- How do you make the proper choice when there are multiple distance constraints going to the same set of hydrogens and when

the constraints are coming from very different directions?

Using Pseudo-Atoms is Not a Perfect Solution.- distance constraint is going to location that is spatially distinct from any of the real atoms.- going to a center average location- need to adjust the distance constraints to account for the location of the pseudo atom.

C

CA CB

N

O

HN

HB1

HB3

HB2

HA

Pseudo-atom (HB#)

Page 52: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

How are the Distance Assignments Made?:- One common approach uses a qualitative analysis of the NMR data to cluster the assignments as strong, medium, weak and very weak based on the intensity of the NOE crosspeak.- The following rules apply:

Strong 2.5 0.7 0.2 for NH-NH constraints use: 2.5 0.7 0.6Medium 3.0 1.2 0.3 for NOEs with NH use: 3.0 1.2 0.5Weak 4.0 2.2 1.0Very Weak 5.0 2.0 1.0the lower limit is always set to slightly less than twice the hydrogen van der Waals radius (1.8Å)

For hydrogen bond constraints:constraint between O & N 2.8 0.4 0.5constraint between O & HN 1.8 0.3 0.5

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Distance Constraints

Pseudo-Atoms/Wildcards

assign ( resid 14 and name HD* ) ( resid 97 and name HD* ) 4.0 2.2 3.0

Distance information

Page 53: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Distance Constraints Rules for pseudo-atom distance corrections:

1) For non stereoassigned CH’s add 1.0 to upper boundif HB# is used instead of HB1 or HB2

2) 1.0 is added to upper bound for other methylenesif HG#, HD# or HE# is used instead of HG1,HG2, etc

3) 2.0 is added to upper bound for aromatic CH and CH if HD# or HE# is used instead of HD1,HD2, etc for Tyr and

Phe

4) 1.5 is added to upper bound for CH3 protons of methylsif HB# is used for Ala HB1,HB2,HB3 or HG2# is used for HG21,HG22,HG23 for Thr or any other methyls

5) 2.4 is added to upper bound for non stereoassigned methylsif HG* or HD* are used for Val or Leu methyls

6) Corrections are additivedistance constraint from an HB# to a HD* would add 1.0Å +

2.4Å = 3.4Ådistance constraint between two HD* would add 2x2.4Å=

4.8Å

Page 54: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Distance Pseudoatom Replacement Intraresiduecorrection

Interresiduecorrection

    [Å] [Å]

NH-CH HB# 0.6 1.0

NH-CH HG# 1.0 1.0

CH-CH HB# 0.6 1.0

CH-CH HG# 0.6 1.0

NH-CH (Val) NH-CH (Leu)

HG1# , HG2#, HD1#, HD2#HG*, HD*

1.0 1.7

1.0 2.4

CH-CH (Val) CH-CH (Val)

HG1# , HG2#, HD1#, HD2#HG*, HD*

0.6 0.6

1.0 2.4

NH-CringH HD#, HE# 2.4 2.4

CH-CringH HD#, HE# 2.0 2.4

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Distance Constraints

Additional specific rules for pseudo-atom corrections differentiating between NH and CH, NH more labile longer distances differentiating between intra- and inter- residue distances, steric reasons limit some intra-residue distances

Page 55: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Dihedral Constraints a file (dihedral.tbl) containing a list of , , , and constraints

assign (resid 1 and name c ) (resid 2 and name n ) (resid 2 and name ca ) (resid 2 and name c ) 1.0 -93.57 30.0 2

assign (resid 2 and name n ) (resid 2 and name ca ) (resid 2 and name c ) (resid 3 and name n ) 1.0 121.89 50.0 2

1

assign (resid 10 and name n ) (resid 10 and name ca ) (resid 10 and name cb ) (resid 10 and name cg or or cg1 or og or og1 ) 1.0 -60.0 20.0 22

assign (resid 143 and name ca ) (resid 143 and name cb ) (resid 143 and name cg ) (resid 143 and name cd1 or nd1 ) 1.0 90.0 30.0 2

Force constant (kcal mol-1 rad-2) Dihedral

angle targetRange around

restraint angle (±)

Exponent of restraint function

XPLOR assign statement

Atoms and residues involved in the dihedral constraint

Different possible atom types, depending

on the amino acid

Page 56: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Carbon Chemical Shift Constraints a file (carbon.tbl) containing a list of C and C constraints

assign (resid 1 and name c) (resid 2 and name n) (resid 2 and name ca) (resid 2 and name c) (resid 3 and name n) 53.1 43.0

XPLOR assign statement

C chemical shift of residue 2

C chemical shift of residue 2

Backbone atoms of of residue 2

Backbone atoms directly preceding and following residue 2

Carbon chemical shiftsare related to ,

assign (resid 55 and name c) (resid 56 and name n) (resid 56 and name ca) (resid 56 and name c) (resid 57 and name n) 44.2 $noexpectation

Carbon chemical shiftAssignment for a Gly

No C chemical shift for a Gly

Can also be used for other residues with missing assignments

Kuszewski et al. (1995) J. Mag. Res. B 106:92

Page 57: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Coupling Constant Constraints a file (coupling.tbl) containing a list of 3JNHC constraints

assign (resid 2 and name c) (resid 3 and name n) (resid 3 and name ca) (resid 3 and name c) 9.94 0.5

XPLOR assign statement

Backbone atoms of of residue 3

Backbone atom directly preceding residue 3

Coupling constantis related to

3JNHC coupling constant (Hz) for residue 3

Range around coupling constant

(±J)

Typical experimental error in J

Page 58: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Empirical backbone-backbone hydrogen bond constraints a file (hbdb.tbl) containing a list of hydrogen-bond constraints this file is in addition to distance constraint file (noe.tbl) that will also have hydrogen-bond constraints. two different goals:

– noe.tbl: keep atoms within hydrogen bonding distance– hbdb.tbl: optimize the length and angle of the hydrogen bond to within known parameters

assign (acc and resid 99 and name O ) ( don and resid 103 and name HN )

H-bond acceptor (acc) atom and residue H-bond donor (don) atom and residue

Grishaev & Bax (2004) J. Am. Chem. Soc. 126:7281

Page 59: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Empirical backbone-backbone hydrogen bond constraints

Variables that define the relative arrangement of the atomic frame of the donor and acceptor peptidyl units.

these parameters are optimized by XPLOR

Page 60: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Empirical backbone-backbone hydrogen bond constraints

Typical Range of Hydrogen Bond Conformations

-helix antiparallel -sheet parallel -sheet long-rang

Page 61: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Ramachandran Database

does not require a user defined experimental data file refines structure based on expected , , 1 and 2 angles from PDB uses standard expectation files part of XPLOR-NIH distribution Simply invoked as part of an XPLOR refinement script

set message off echo off endramanres=10000!intraresidue protein torsion angles@/home/PROGRAMS/xplor-nih-2.9.1/databases/torsions_gaussians/shortrange_gaussians.tbl@/home/PROGRAMS/xplor-nih-2.9.1/databases/torsions_gaussians/new_shortrange_force.tbl!interresidue protein torsion angle correlations i with i+/-1@/home/PROGRAMS/xplor-nih-2.9.1/databases/torsions_gaussians/longrange_gaussians.tbl@/home/PROGRAMS/xplor-nih-2.9.1/databases/torsions_gaussians/longrange_4D_hstgp_force.tblend@/home/PROGRAMS/xplor-nih-2.9.1/databases/torsions_gaussians/newshortrange_setup.tbl@/home/PROGRAMS/xplor-nih-2.9.1/databases/torsions_gaussians/setup_4D_hstgp.tbl

.

.

.

.

.

.

Turns on the Ramachandran database function

Automatically sets-up all the expected torsion angles for the protein sequence

Kuszewski et al. (1996) Protein Sci. 5:1067Kuszewski et al. (1997) J. Mag. Res. 125:171

Page 62: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Ramachandran Database refines structure based on expected , , 1 and 2 angles from PDB

Page 63: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Radius of Gyration (Rg): of a group of atoms is defined as the root-mean-square distance from each atom of the molecule to their centroid

where ri and rj are the position vectors of atoms i and j, and N is the number of atoms. Rg measures the compactness of the protein. For globular proteins, Rg can be predicted on the basis of the number of residues (N) in the protein

Different predicted Rg for different oligomer shapes

Correlation between predicted and calculated Rg for X-ray/NMR structures

Page 64: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations Radius of Gyration (Rg) does not require a user defined experimental data file

refines structure based on expected Rg Simply invoked as part of an XPLOR refinement script Important caveats:

Do not include dynamic flexible N- and C-terminus Needs to represent globular region of protein For extended protein structure, divide protein into segments resembling a globular region and use multiple radius of gyration definitions

collapse assign (resid 1:111) 100.0 12.67 scale 1.0end

.

.

.

.

.

.

Turns on the radius of target function

Function will be applied to this residue range

Radius of gyration ([2.2*(number residues)0.38]-0.5) Force constant

Page 65: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations ► Fourth Step is to fold the extended structure using the NMR experimental data.

Distance Geometry (DG) Converts molecule represented as a series of distances to Cartesian coordinates

N atoms there are N(N − 1)/2 distances but only 3N coordinates. Provides initial set of structures consistent with NMR experimental data

Given n atoms a1, …, an and a set of distances di,j between ai and aj, (i,j) in S

Sj)(i,,d||xx||

thatsuch a,...,afor x,,x scoordinate thefind

ji,ji

n1n1

This method is based on a calculation of matrices of distance constraints for each pair of atoms from all available distance constraints, bond and torsion angles as well as van der Waals radii. This set of distances is then projected from the n-dimensional distance space into the three-dimensional space of a cartesian coordinate system, in which it determines the coordinates of all atoms of the proteins.

Page 66: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Distance Geometry (DG) Matrix Algebra

A matrix D can be constructed containing the distance between every pair of atoms.

the distance from every point to the center of mass (O) can be calculated :

the metric matrix G was then defined as the matrix where each element gij is the dot product of the two vectors from the center of mass to the points i and j:

the matrix G would have at most three eigenvalues greater than zero 1, 2, 3. If a matrix W were constructed using the corresponding eigenvectors as its columns, the Cartesian coordinates (xi,yi,zi) could be calculated as:

n

j

n

j

j

kjkijiO dndnd

1 2

1

1

2221

22 ii wy

11 ii wx

33 ii wz

Page 67: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Distance Geometry (DG) Matrix Algebra

Can Not Be Implemented As Described:

1) Distance Matrix D is Neither Complete Nor Accurate

2) Metric Matrix G calculated from Such a Matrix May Not Only Have Three Positive Eigenvalues

To Obtain A Reasonable Starting Matrix D:

1) build matrix L of lower distance bounds2) build matrix U of upper distance bounds3) triangle smooth matrix U

a) for every triple of points in turn (i,j,k)b) distance dij may be unknown, but an upper bound uij must be less than the sum of the upper bounds ujk and uik (uij ≤ uik + ukj)

4) similar inverse triangle inequality is applied to matrix L5) construct an initial trial Matrix D by choosing elements dij randomly between uij (matrix U) and lij (matrix L)

Page 68: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR

Protein Based Programs: Calculate Protein Structures► General overview of XPLOR Protein Structure Calculations

Distance Geometry (DG) Matrix Algebra

Because of the Way the Trial Matrix is Generated:

1) May not correspond to any three dimensional structure

2) May be a distance matrix representing an M-dimensional set of coordinates

3) May have M positive eigenvalues, and truncation after three largest eigenvalues of G will be a rather poor approximation

The Generated Three-dimensional coordinates will require some further refinement.

Distance Matrix Does Not Naturally Include Chirality of Molecules

1) both L and D isomers of amino acid have exactly the same distance matrix

2) corrected in refinement process

Page 69: Software Can be Grouped into Two General Classes: Protein Based Programs: ► Calculate Protein Structures  XPLOR (NIH, CNS,CXS), CYANA, CHARMM, Sybyl,

Software for Protein Structures by NMR Protein Based Programs: Calculate Protein Structures

► General overview of XPLOR Protein Structure Calculations► Fifth Step is to refine the DG structure using simulated annealing

Simulated Annealing (SA) Include additional structural constraints (Rg, Rama, RDC, chemical shifts, etc) Conceptually, raise the “temperature” of the protein (1500-300K) Move the protein’s structure around to sample different conformations

low barriers to movement allow atoms to pass through each other Slowly “cool” the sample (lower temperature)

slowly increase forces associated with structure constraints Structure will “anneal” to low energy conformation consistent with structural and experimental constraints