exploring 3d molecular structures using ncbi tools a field guide june 17, 2004
TRANSCRIPT
![Page 1: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/1.jpg)
Exploring 3D Molecular Structures Using NCBI Tools
A Field Guide
June 17, 2004
![Page 2: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/2.jpg)
NCBI Structure Resources
• Overview of Structural Informatics at NCBI• How 3D Macromolecular Structures are Determined• Indexing Structural Data at NCBI• Finding Homologous Structures
– By Sequence Similarity: BLAST– By Structure Similarity: VAST– By Conserved Function: RPS-BLAST and CDD
• Finding a Structural Template for a Query Protein
![Page 3: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/3.jpg)
The National Center for Biotechnology Information
• Created as a part of NLM in 1988– Establish public databases– Perform research in computational biology– Develop software tools for sequence analysis– Disseminate biomedical information
![Page 4: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/4.jpg)
Structural Informatics
ChemicalFormula
3D Conformation
Function
ARKLMPQSCSW…ModificationsIonsLigands
Binding Sites Catalytic ResiduesKinetics ThermodynamicsSubstrates Intermediates
StructureDynamicsActive StatesFolding
![Page 5: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/5.jpg)
Structural Informatics
ChemicalFormula
3D Conformation
Function
GenPeptNCBI RefSeqSWISS-PROTPIRPRF
Multiple Sequence Alignments:Pfam, SMART, COGs, CDD
PDB
![Page 6: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/6.jpg)
Structural Informatics at NCBI
ChemicalFormula
3D Conformation
Function
GenPeptNCBI RefSeqSWISS-PROTPIRPRF
Multiple Sequence Alignments:Pfam, SMART, COGs, CDD
EntrezProtein
EntrezDomains
PDB
EntrezStructure
Entrez3D Domains
4,818,495 25,003
11,382
103,820
![Page 7: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/7.jpg)
The Entrez System
Entrez
Nucleotide
PubMed
Protein
Taxonomy
Structure Domains
3D Domains
Books
Journals
PMC
OMIM
UniSTS
PopSet
GenomeSNP UniGene
Gene
GEO
GEO Datasets
MeSH
![Page 8: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/8.jpg)
Solving StructuresX-Ray Crystallography
Bond r (Å)
C-S 1.82
C-C 1.54
C-N 1.47
C-O 1.43
S-H 1.34
C=O 1.20
C-H 1.09
N-H 1.01
O-H 0.96
Electron Density Map
P F I
Resolution
5 Å 3 Å 1 Å T or V?
Challenges
Disorder
Cn3D
![Page 9: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/9.jpg)
More About Resolution1EJG: Crambin at 0.54 Å 2TMA: Tropomyosin at 15 Å
protons!! only alpha carbons!!
3 Å
“Temperature”
![Page 10: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/10.jpg)
Solving StructuresNuclear Magnetic Resonance Spectroscopy
Bo
Constraint List
DistancesDihedral AnglesOrientation
Models consistentwith constraints
RMSD (Å)
Cn3D
![Page 11: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/11.jpg)
PDB
![Page 12: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/12.jpg)
PDB File: HeaderHEADER ISOMERASE/DNA 01-MAR-00 1EJ9TITLE CRYSTAL STRUCTURE OF HUMAN TOPOISOMERASE I DNA COMPLEX COMPND MOL_ID: 1; COMPND 2 MOLECULE: DNA TOPOISOMERASE I; COMPND 3 CHAIN: A; COMPND 4 FRAGMENT: C-TERMINAL DOMAIN, RESIDUES 203-765; COMPND 5 EC: 5.99.1.2; COMPND 6 ENGINEERED: YES; COMPND 7 MUTATION: YES; COMPND 8 MOL_ID: 2; COMPND 9 MOLECULE: DNA (5'- COMPND 10 D(*C*AP*AP*AP*AP*AP*GP*AP*CP*TP*CP*AP*GP*AP*AP*AP*AP*AP*TP* COMPND 11 TP*TP*TP*T)-3'); COMPND 12 CHAIN: C; COMPND 13 ENGINEERED: YES; COMPND 14 MOL_ID: 3; COMPND 15 MOLECULE: DNA (5'- COMPND 16 D(*C*AP*AP*AP*AP*AP*TP*TP*TP*TP*TP*CP*TP*GP*AP*GP*TP*CP*TP* COMPND 17 TP*TP*TP*T)-3'); COMPND 18 CHAIN: D; COMPND 19 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 EXPRESSION_SYSTEM_COMMON: BACULOVIRUS EXPRESSION SYSTEM; SOURCE 4 EXPRESSION_SYSTEM_CELL: SF9 INSECT CELLS; SOURCE 5 MOL_ID: 2; SOURCE 6 SYNTHETIC: YES; SOURCE 7 MOL_ID: 3; SOURCE 8 SYNTHETIC: YES KEYWDS PROTEIN-DNA COMPLEX, TYPE I TOPOISOMERASE, HUMAN
REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.60 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.1 REMARK 3 AUTHORS : BRUNGER …REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: 27% PEG 400, 145 MM MGCL2, 20 REMARK 280 MM MES PH 6.8, 5 MM TRIS PH 8.0, 30 MM DTT REMARK 290 ...
![Page 13: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/13.jpg)
PDB File: DataATOM 1 N TRP A 203 30.156 -4.908 37.767 1.00 50.81 N ATOM 2 CA TRP A 203 30.797 -4.667 36.431 1.00 49.96 C ATOM 3 C TRP A 203 30.369 -3.337 35.766 1.00 49.18 C ATOM 4 O TRP A 203 29.315 -3.238 35.147 1.00 49.27 O ATOM 5 CB TRP A 203 30.518 -5.863 35.513 1.00 46.77 C ATOM 6 CG TRP A 203 30.847 -5.651 34.081 1.00 44.60 C ATOM 7 CD1 TRP A 203 32.028 -5.234 33.553 1.00 49.72 C ATOM 8 CD2 TRP A 203 29.980 -5.876 32.984 1.00 43.73 C ATOM 9 NE1 TRP A 203 31.956 -5.191 32.177 1.00 45.45 N ATOM 10 CE2 TRP A 203 30.704 -5.582 31.805 1.00 45.23 C ATOM 11 CE3 TRP A 203 28.657 -6.305 32.877 1.00 46.48 C ATOM 12 CZ2 TRP A 203 30.149 -5.705 30.539 1.00 46.06 C ATOM 13 CZ3 TRP A 203 28.101 -6.431 31.622 1.00 43.08 C ATOM 14 CH2 TRP A 203 28.849 -6.131 30.463 1.00 45.77 C …
Name
AtomNumber
AtomName
ResidueName
Chain ID
ResidueNumber
YX Z
Occupancy
TemperatureFactor
Issues:Justification
Nomenclature
ATOM 1 N TRP A 203 30.156 -4.908 37.767 1.00 50.81
![Page 14: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/14.jpg)
From PDB to Entrez
Structure
3D DomainsProtein
Domains
![Page 15: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/15.jpg)
From Coordinates to Models1EJ9: Human topoisomerase I
![Page 16: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/16.jpg)
Building the Structure Summary
Taxonomy
Pubmed
Protein 3D Domains
Domains
Nucleotide
![Page 17: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/17.jpg)
Indexing into MMDB
Structure
• Import only experimentally determined structures• Convert to ASN.1 • Verify sequences
inter-residue-bonds { { atom-id-1 { molecule-id 1 , residue-id 1 , atom-id 1 } , atom-id-2 { molecule-id 1 , residue-id 2 , atom-id 9 } } ,
id 1 , name "helix 1" , type helix , location subgraph residues interval { { molecule-id 1 , from 49 , to 61 } } } ,
Add secondary structure Add chemical bonds
• Create “backbone” model (Cα, P only)• Create single-conformer model
![Page 18: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/18.jpg)
Structure Indexing
Entrez• MMDB-ID• MMDB entry date• EC number • Organism
PDB• Accession• Release date• Class• Source• Description• Comment
Ligands• PDB code• PDB name• PDB description
Literature• Article title• Author• Journal • Publication date
Experimental• Method• Resolution
Counters• Ligand types• Modified amino acids• Modified nucleotides• Modified ribonucleotides• Protein chains• DNA chains• RNA chains
topoisomerase AND 2[dnachaincount] AND human[organism]
![Page 19: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/19.jpg)
Creating Sequence Records
Protein Nucleotide Nucleotide
1EJ9A 1EJ9C 1EJ9D
One record per chain
![Page 20: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/20.jpg)
Building the Structure Summary
![Page 21: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/21.jpg)
Building the Structure Summary
![Page 22: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/22.jpg)
Annotating Secondary Structure1EJ9: Human topoisomerase I
α-Helices
β-strands
coils/loops
![Page 23: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/23.jpg)
Creating 3D Domains3D Domain 0: 1EJ9A0 = entire polypeptide
![Page 24: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/24.jpg)
Creating 3D Domains
3D Domains
1EJ9A1
1EJ9A3
1EJ9A2
1EJ9A4
1EJ9A5
< 3 Secondary Structure Elements
![Page 25: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/25.jpg)
Building the Structure Summary
![Page 26: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/26.jpg)
Building the Structure Summary
![Page 27: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/27.jpg)
3D Domain IndexingEntrez• SDI• MMDB-ID• Accession• MMDB entry date • Organism• Domain number• Cumulative number
PDB• Accession• Release date• Class• Source• Description• Comment
Literature• Article title• Author • Publication date
Counters• Modified amino acids• α-Helices• β-Strands• Residues• Molecular weight
REMEMBER:3D Domain 0 is the entirepolypeptide chain!
4[helixcount] AND 0[strandcount] AND 0[domainno] AND viruses[organism]
Find all viral four helix bundles
![Page 28: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/28.jpg)
Conserved Domains
Weakly conserved serine Active site serine
Sequences Aligned by Function
![Page 29: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/29.jpg)
Linking Sequence to FunctionThe PSSM Position Specific Score Matrix
A R N D C Q E G H I L K M F P S T W Y V 206 D 0 -2 0 2 -4 2 4 -4 -3 -5 -4 0 -2 -6 1 0 -1 -6 -4 -1 207 G -2 -1 0 -2 -4 -3 -3 6 -4 -5 -5 0 -2 -3 -2 -2 -1 0 -6 -5 208 V -1 1 -3 -3 -5 -1 -2 6 -1 -4 -5 1 -5 -6 -4 0 -2 -6 -4 -2 209 I -3 3 -3 -4 -6 0 -1 -4 -1 2 -4 6 -2 -5 -5 -3 0 -1 -4 0 210 S -2 -5 0 8 -5 -3 -2 -1 -4 -7 -6 -4 -6 -7 -5 1 -3 -7 -5 -6 211 S 4 -4 -4 -4 -4 -1 -4 -2 -3 -3 -5 -4 -4 -5 -1 4 3 -6 -5 -3 212 C -4 -7 -6 -7 12 -7 -7 -5 -6 -5 -5 -7 -5 0 -7 -4 -4 -5 0 -4 213 N -2 0 2 -1 -6 7 0 -2 0 -6 -4 2 0 -2 -5 -1 -3 -3 -4 -3 214 G -2 -3 -3 -4 -4 -4 -5 7 -4 -7 -7 -5 -4 -4 -6 -3 -5 -6 -6 -6 215 D -5 -5 -2 9 -7 -4 -1 -5 -5 -7 -7 -4 -7 -7 -5 -4 -4 -8 -7 -7 216 S -2 -4 -2 -4 -4 -3 -3 -3 -4 -6 -6 -3 -5 -6 -4 7 -2 -6 -5 -5 217 G -3 -6 -4 -5 -6 -5 -6 8 -6 -8 -7 -5 -6 -7 -6 -4 -5 -6 -7 -7 218 G -3 -6 -4 -5 -6 -5 -6 8 -6 -7 -7 -5 -6 -7 -6 -2 -4 -6 -7 -7 219 P -2 -6 -6 -5 -6 -5 -5 -6 -6 -6 -7 -4 -6 -7 9 -4 -4 -7 -7 -6 220 L -4 -6 -7 -7 -5 -5 -6 -7 0 -1 6 -6 1 0 -6 -6 -5 -5 -4 0 221 N -1 -6 0 -6 -4 -4 -6 -6 -1 3 0 -5 4 -3 -6 -2 -1 -6 -1 6 222 C 0 -4 -5 -5 10 -2 -5 -5 1 -1 -1 -5 0 -1 -4 -1 0 -5 0 0 223 Q 0 1 4 2 -5 2 0 0 0 -4 -2 1 0 0 0 -1 -1 -3 -3 -4 224 A -1 -1 1 3 -4 -1 1 4 -3 -4 -3 -1 -2 -2 -3 0 -2 -2 -2 -3
Serine scored differently in these two positions
Active site nucleophile
![Page 30: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/30.jpg)
Pfam-A seeds: HMM based models representing a wide variety of functional domains derived from SWISS-PROT
COG
SMART
CD
Entrez Domains (CDD v2.00)
HMM based models originally concentrating on eukaryotic signalingdomains, now expanding
BLAST based alignments derived from complete proteomes of prokaryotes
NCBI curated domains based on sequence and structural alignments
Pfam pfam01234
smart00123
cd01234
COG0123
NCBI
NCBI
Sanger
EMBL
Single Domains
Protein Families
A database of Position Specific Score Matrices (PSSMs)
![Page 31: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/31.jpg)
CD-Search Output
CD
SMART
Pfam
COG
Click on a colored bar to align your sequence to the CD
![Page 32: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/32.jpg)
CD Summary
Alignment view controls
Cn3D launch
PSSM created
Aligned query
![Page 33: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/33.jpg)
Building the Structure Summary
![Page 34: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/34.jpg)
Building the Structure Summary
Cn3D
![Page 35: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/35.jpg)
Creating Entrez Links
NCBI Taxonomy
Literature from PDB
Sequences
Full Chain
Entrez Structure
Entrez 3D Domains
![Page 36: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/36.jpg)
Links to CDsCD-Search / RPS-BLAST
1EJ9A
Query: protein sequence Database: PSSMs
pre-computed inEntrez Protein
Enter accession, GI,or FASTA sequenceinto RPS-BLAST
![Page 37: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/37.jpg)
Finding Homologous Structures
• By sequence similarity: BLAST
• By structural similarity: VAST
• By conserved function: CD-Search
EntrezProtein
EntrezStructure
Entrez3D Domains
EntrezDomains
![Page 38: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/38.jpg)
BLAST: Sequence Neighbors
BLAST Related StructuresDisplays a graphical and text alignment between a query sequence and a similar sequence with structure
Accessed from• Blink• Any protein BLAST search
?GVKWKYLEHKGPVFAPPYDPLP
GIKWKFLEHKGPVFAPPYEPLP
![Page 39: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/39.jpg)
BLink NeighborsEAA05377: ENSANGP00000011118 from A. gambiae
Related Structures
![Page 40: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/40.jpg)
Related Structures from BLASTp
![Page 41: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/41.jpg)
Related Structures Cn3D
![Page 42: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/42.jpg)
VAST: Searching by StructureWhy search for similar structures?
• To find homologs that sequence searches cannot: distant protein homologs often conserve structure more strongly than sequence
• To explore protein evolution: similar protein folds can be used to support different functions
• To identify conserved core elements of a protein fold that can be used to model related proteins of unknown structure
![Page 43: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/43.jpg)
VAST: Structure NeighborsVector Alignment Search Tool
For each protein chain,
locate SSEs (secondarystructure elements),
and represent them asindividual vectors. 1
2
3
4
5 6
Human IL-4
![Page 44: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/44.jpg)
VAST: Calculate ij
1
2
3
4
5 6
16
4
5
2
14
zFor both the query andtarget structures,
Calculate the midpointof each SSE.
For each SSE k,align k along z andproject midpoints ontothe xy plane.
Then calculate [ij]k fori ≠ k, j ≠ k.
Vector position about the z axis
![Page 45: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/45.jpg)
VAST: Calculate (rik, zik)
3
1
zFor both the query andtarget structures,
For each SSE k,set the origin at themidpoint of k.
Then calculate rik andzik for the endpoints ofSSEs i ≠ k.
Vector position relative to the xy plane
xyz13
r13
![Page 46: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/46.jpg)
VAST: Create Comparison Graph
IL-4
IL-6
3 1
4
6
12
3
5
1 2 3 4 5 6
1
2
3
4
5
4
2
5
Nodes: r13<>r12
z13<>z12
Arcs: 16<>15
must follow sequence order
Select path with highest “weights”
N
N
C
C
![Page 47: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/47.jpg)
VAST: Refinement
Aligned residuesare red
Alignment extended to the end of this strand
C atoms are added to the aligned SSEs
Alignments are allowed to extend beyond SSE boundaries
All atoms are added to the models, and the detailed backbone and sidechain positions are refined
![Page 48: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/48.jpg)
VAST: Alignment of Sequence• Aligned blocks represent structural core elements• Aligned blocks have no internal gaps• Aligned residues occupy the same position in space• Aligned residues are shown in CAPITAL letters
Helix 1
Helix 2 Helix 3
Helix 4
![Page 49: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/49.jpg)
VAST: Scoringp = d P(s > s0, n) c(n, P1, P2)
P(s > s0, n) Probability of observing an alignment of n SSEs with a score greater than s0 by chance.
c(n, P1, P2)Search space:Number of possible alignments of n SSEs between vector sets P1 and P2.
d Number of structures searched (set to 500)
The probability that the VAST alignment occurred by chance.
![Page 50: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/50.jpg)
VAST: Summary• Secondary structure elements are represented as vectorsand are aligned based on their relative orientations
• VAST ignores loops and tolerates variation in SSE length• The initial alignment is wholly ignorant of atomic coordinates
• Pathways through aligned SSEs respect sequence order• VAST is sensitive to topology
NN N
C C
C
• Alignments are extended and optimized using all-atom models• Aligned blocks may extend across or into loops or other SSEs
![Page 51: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/51.jpg)
Query by Chain vs 3D Domain
Query by whole chain
Query by domain 5
Not found using whole chain query!
c(n, P1, P2) is smaller for a 3D domain!
![Page 52: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/52.jpg)
VAST: Multiple Alignments Cn3D
![Page 53: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/53.jpg)
nr-PDB Sets
EntrezStructure
Choose criteria for inclusion in a set
Non-redundant set ofsequence similar clusters
VAST reports onerepresentative from each cluster
![Page 54: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/54.jpg)
Submitting a PDB File to VAST
• Pick the correct file format• Remove all records except ATOM• This is the best way to convert PDB into MMDB format!
![Page 55: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/55.jpg)
Blocks in CD Alignments
Alignment view controls
Aligned query
Cn3D launch
Block 1 Block 2 Block 3
Consensus sequence created
PSSM created
![Page 56: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/56.jpg)
Curating CD Alignmentssmart00235
VAST
cd00203
Cn3DCn3D
![Page 57: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/57.jpg)
Curated CD Summary
List of annotated features
Customized view of the selected feature in Cn3D
Residues comprising the selected feature
Cn3D
![Page 58: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/58.jpg)
CD-Curation: Effect on model alignment accuracy
04
81
2
0 10 20 30 40 50 60 70 80 90 100
%id in structure alignment
mo
de
l alig
nm
en
t R
MS VAST
04
81
2
0 10 20 30 40 50 60 70 80 90 100
%id in structure alignment
mo
de
l alig
nm
en
t R
MS RPS-BLAST before curation
04
81
2
0 10 20 30 40 50 60 70 80 90 100
%id in structure alignment
mo
de
l alig
nm
en
t R
MS RPS-BLAST after curation
A. Marchler-Bauer
![Page 59: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/59.jpg)
CDART
Only available for single domain records:cd, pfam, smart
![Page 60: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/60.jpg)
Finding a Structural TemplateOverall Strategy: For a query protein sequence, construct a block alignment representing conserved core SSEs of the most sequence similar structures to the query, and then align the query sequence to this template.
1. Construct the block alignmentA. Curated CD: Locate using CD-Search and use the sequences
most similar to the queryB. VAST: Find the most sequence similar structure and find its
VAST neighbors
2. Align the query to the template: Use Cn3DA. PSI-BLAST: Aligns sequence using PSSM of current alignmentB. BLOCKER: Aligns sequence to an existing block alignment: use
where sequence similarity is highC. Threader: Aligns sequence to a structure and a block alignment:
use where sequence similarity is low
![Page 61: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/61.jpg)
BLOCKER: The Block Aligner
PSSM
• Creates alignments that match the existing block structure• Matches are scored from a PSSM generated from the block alignment• An entire block must be matched with no internal gaps• There are no penalties for gaps between blocks up to a set gap length• Can perform both local and global alignments• Generally used after BLAST or PSI-BLAST
The Block Aligner tests the existing block structure
![Page 62: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/62.jpg)
BLAST/PSSM vs BLOCKER
BLAST/PSSM
BLOCKER
Alignment
Import and align GI 1470115
![Page 63: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/63.jpg)
The NCBI ThreaderLRLSLEQLQVIAIAN
Input• Structure• Block alignment• Sequence
Attempts to find matches based on chemical contacts, mainly buried hydrophobic interactions
Useful on blocks for which sequence alignment methods fail
Should be iterated with varying block structures
Cn3D
![Page 64: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/64.jpg)
The Future
• More curated CDs: they keep coming…• Pre-computed Related Structures for all sequences in
Entrez Protein• CD “children”: subfamilies of large CD records based on
sequence and structure similarity• Improved mapping of SNP data onto 3D structures• Further linking of structural and genomic biology
![Page 65: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004](https://reader036.vdocument.in/reader036/viewer/2022081603/56649e545503460f94b4b884/html5/thumbnails/65.jpg)
What comes next…
• Workshop I– Working with Structures
• Workshop II– Working with Alignments
• All exercises and other resources will remain on the course web pages
• [email protected]• NCBI Handbook, Ch. 3