introduction to protein structures · worldwide partners that manage the protein data bank (pdb), a...
TRANSCRIPT
The Protein Data Bank Europe (PDBe) (http://www.pdbe.org/) is one of the
worldwide partners that manage the Protein Data Bank (PDB), a collection of all publicly
available 3-dimensional structures of biological macromolecules. The Worldwide Protein
Data Bank (wwPDB) consists of organizations that act as deposition, data processing and
distribution centres for PDB data. The founding members are RCSB PDB (USA), PDBe
(Europe) and PDBj (Japan). The mission of the wwPDB is to maintain a single Protein
Data Bank Archive of macromolecular structural data that is freely and publicly available
to the global community.
In addition to its role in data deposition, processing and distribution, the PDBe is
also involved in the creation of a relational database that integrates data available from
experimentally determined protein structure with protein sequence information, textual
information from scientific publications and a number of derived properties that augment
the macromolecular structure information. The database also contains information on a
variety of ligands, cofactors and smaller chemical entities that interact with a protein. The
PDBe group has also developed several algorithms for analysis of protein structures. In
addition, information from 3D electron microscopy is stored and can be accessed from
the database.
Introduction to Protein Structures
(Expected Time for completion: 1 hour)
This exercise will cover the basic types of protein structures as represented in the Protein
Data Bank and an introduction to the PDBe entry information pages and some search and
analysis services. For a detailed explanation of protein structure components, please see
this excellent introduction in Wikipedia. Fold refers to a global type of arrangement, like
helix-bundle or beta-barrel. Although, there are now over 65,000 experimentally
determined structures in the PDB, the number of unique folds that these protein adopt is
limited, and all proteins can be classified into one of more fold categories, which are
annotated in databases like CATH and SCOP. More often than not, similar functions may
be associated with certain fold of proteins, and the fold classification therefore, serves as
an important tool in understanding the possible function of a protein.
Alpha-helix proteins: There are many different families of proteins which are
composed of only alpha-helices. Please see extra details here. Some examples to explore
are given below.
PDB Entry: 1IRD
Start with the PDBe home page (http://www.pdbe.org/), and in the space provided for Get
PDB by id, type in 1ird, and click on “Go”.
The browser will take you to the entry summary page for this PDB entry 1IRD. Every
entry in the Protein Data Bank is assigned a unique 4-letter „IDCODE‟ The summary
pages provide information concerning various facets of the deposited structure, including
links to external sites and other information derived from the structure itself. Underlined
texts on the summary page are external links, and to search for a particular item in the
whole PDB, click on the ( ) icons.
Coming back to 1IRD, this structure is of Human haemoglobin bound to carbon
monoxide. Choose the visualisation link from the sidebar on the left to view the structure
interactively by choosing “View the PDB entry using Astex viewer.
This will open up a graphics window as shown on the right. The display is interactive and
you can rotate the structure, or center on any residue by clicking on the sequence on the
bottom bar. Click on “Reset View” to zoom out. Looking at the structure, you will notice
that this protein is only composed of alpha-helices. To see the bound heme, choose
“Magic Lens from the menu and move your mouse over the structure! To see which
residues of the protein interact with the heme, choose “Chemistry” and click on any one
of the two HEM‟s shown. You may click on any of the residues shown in the chemistry
popup to center the structure on that residue. The sliders on the top of the chemistry
popup allow adjustment of distances between 0 and 4A from the ligand to see specific
interactions.
To look at the sequence section on the atlas page for this entry, click the 'Structure' and
then 'Primary' in the sidebar, which will show you sequence alignment with UniProt. You
can also view the Pfam classification for this family of proteins (Globin).
Structural Classification for this protein is available from the Tertiary Structure section of
the sidebar. Both SCOP and CATH databases suggest that this protein is a member of all
alpha-helical globin family.
Links to all other cross-referenced external databases are listed under 'Cross references' in
the sidebar. Look at the GO (Gene Ontology) reference here, which lists both the
processes and function this protein is involved in. The “Ligands” section of the summary
pages provide additional on the compounds that this protein is associated with in this
entry. Click on the “Ligands” link on the sidebar and then "interactions" link to view all
interactions between the heme group and the protein in this entry. This will open up a
new browser window/tab to show this information.
This will take you to our PDBeMotif service and will show that HEM in this structure
interacts with Histidine, CMO (Carbon monoxide), Tyrosine and others. All interactions
are colour-coded to indicate the nature of the interaction.
Go back to the summary pages now. To assess the quality of this structure (1ird), expand
the “Links” section from the left hand sidebar and choose “PDBe Validation”. This will
open a new window/tab and show you the geometric quality of the structure
(Ramachandran plot, bond angles/bond lengths etc). Please see here for an explanation
of the Ramachandran plot.
To see if there are any other structures in the PDB that are similar to this one, from the
same “Links” menu on the summary page, choose the “PDBe SSM” link. This will start
the PDBeSSM service that provides a rapid structure alignment and comparison tool.
This job may take a few minutes to complete. Once the task is completed you will be
shown a page containing the results. Scroll to the bottom of this page and resort the
results by “%seq” as shown below.
The page will refresh. Now choose the “Last page” button from the top of this page.
Let see the last result on the last page. The columns of data tell us that this particular
structure has only 7% sequence identity to our haemoglobin but shares 75% structural
identity. Click on the left hand link for this result to see the details page.
Click on the “View Superposed” button to show the two structures aligned with respect to
each other. This will open up a graphical window.
You can see from the alignment that the two proteins are both made up on alpha-helices
arranged in a similar orientation and yet have minimal sequence identity.
Let now move on and explore some other predominant secondary structure folds present
in protein structures.
Beta-sheet proteins: These proteins are composed of only beta-sheets, the other
characteristic secondary structure element in proteins. This group is fairly large and
comprises proteins with widely varying functions, from sugar-binding to metabolic
transport to antibodies. Some examples are given below for you to explore. In each
example below, look at the structure as above, as well as pay attention to the Pfam,
CATH and SCOP classification for each entry to get a feel for the structure.
PDB entry: 1A0S This protein is a beta-barrel protein and is involved in the transport of maltodextrin across
the outer membrane of gram-negative bacteria. Other proteins share similar topology to
this protein. More information about related proteins can be seen from the Pfam Entry.
Essentially this family is comprised of proteins that are collectively called porins. Look at
the GO and Pfam entries for this protein.
Explore the various pages for this entry. See the Primary, Secondary and Tertiary
structures for this protein. You‟ll see that the secondary structure of this protein is
predominantly beta sheets. You may also read the abstract of the paper where this
structure was described (the “Citations” link from the sidebar).
Lets answer a few questions !
Question 1: What compound/s is this protein associated with and what are the
interactions between the compound/s and the protein? (HINT: Look at the ligands page!).
Answer:
Question 2: Which other entries in the PDB are of the same protein? (HINT: The lens
symbol next to the UniProt identifier on the summary page will do a search for all other
entries in the PDB that contain the same sequence).
Answer:
Question 3: One of the authors for this entry is K.Diederichs (Authors section on the
summary page). How many other structures in the PDB have this person as an author?).
Answer:
PDB entry: 1BKZ
This is a structure of a protein called galectin-7. This protein belongs a specific family of
proteins called galectin (or s-lectin). The name derives from the fact that almost members
of this family of protein bind to galactoside sugars. This protein belongs to a different
family of all beta-sheet proteins (SCOP entry). Look at the various links from the sidebar
for this entry for more information and try to answer the questions below.
Question 1: What is biological function of this protein as described by GO? (HINT: See
the Cross-References section for the GO classification.)
Answer:
Question 2: This structure does not appear to be bound to any ligand, but are there any
other PDB entries for the same protein that have bound ligands? Which sugars do the
other entries for the same protein associate with? (Hint: Search for other entries that
contain the same UniProt sequence and see their titles and summary pages!)
Answer:
Question 3: Look at the ligands page for PDB entry 1w6o (http://www.pdbe.org/1w6o).
Is there anything common between the ligand interactions of LAT (alpha-lactose) in
1w6o and GAL (beta-D-galactose) in 2gal (http://www.pdbe/org/2gal)? (Hint: Look at
the interactions of GAL with PDB entry 2GAL and the interactions of LAT with 1w6o
from the ligand sections for each of the entries).
Answer:
Alpha-Beta proteins: This is the most populous category in protein fold classification.
(Link to SCOP (a/b) and Link to SCOP (a+b) ). SCOP has a total of 415 classes of
proteins that are composed of alpha helices and beta-sheets in different topologies. Most
enzymes fall into one of these families. Let us look at a few examples.
PDB entry: 1AFL 1AFL is the structure of pancreatic ribonuclease from Bos taurus (cattle). Ribonucleases
make up a large family of proteins with similar enzymatic functions and structures and
include members that are implicated in angiogenesis (blood vessel growth in cancers).
Ribonucleases essentially cleave RNA. Read more about the function of ribonucleases
from the InterPro and Pfam entries for this structure from the “Cross references” on the
sidebar.
As is probably obvious there are over 150 structures of pancreatic ribonuclease
determined in complex with various enzymatic inhibitors. Look at the “Ligands” page for
this entry. This protein is bound to a compound called ATR, which is a modified
ribonucleotide that binds to the active site and inhibits the activity of the enzyme. View
the interactions of ATR with 1AFL.
Under the “Links” sidebar , click on “PDBe SSM” to compare the 1AFL fold with the
rest of the PDB. Once the results of these are available, choose "Sort by Seq%" from the
bottom of the page and wait for this page reload. From the right side bottom of the page,
choose the "Last Page" button to go to the last page. Look for 2i5s in the results, which
has 29% sequence identity and 78% identity in structure. Click on the link on the right
side and in the details results page choose “View superposed”. This will throw up a
graphics window and show the structural alignment of 1AFL with 2I5S (an onconase).
You can rotate the aligned structures. The two structures are very similar in fold but share
very low sequence identity.
Now look at the atlas pages for 2I5S (http://www.pdbe.org/2i5s) and answer the
following questions.
Question 1: What are the similarities in function between 1AFL and 2I5S ? (Hint: Look
at the GO entries for both structures!).
Answer:
Question 2: Which residues from the protein interact with ligand ATR by hydrogen-bond
interactions in PDB entry 1AFL?
Answer:
Question 3: This protein is from Bos taurus. How many protein structures in the PDB
have been determined from the same source? (HINT: Click on the lens symbol) next to
the organism on the summary page for PDB entry 1AFL).
Answer:
Question 4: What is the EC number for this class of protein? (HINT: Summary Page),
and how many structures of the same enzyme class have been determined? (HINT: Lens
!).
Answer:
PDB entry: 1KWW Look at the summary page for this entry (http://www.pdbe.org/1kww), the structure of a
mannose-binding protein from Rat. These sugar-binding proteins are characterized by
binding mannoside sugars in the presence of metal ions such as Calcium (hence called c-
lectins). Look over the structure carefully and try to answer the questions below.
Question 1: How many structures for this protein are present in the PDB archive ?
Answer:
Question 2: Which protein residues interact with Calcium (CA) ions present in the
structure, and what is the predominant nature of this interaction ?
Answer:
Question 3: Compare the binding site of MFU in PDB entry 1KWW
(http://www.pdbe.org/1kww), with that of the binding site of FUC in PDB entry 3KMB
(http://www.pdbe.org/3kmb). Can you identify the binding-site for sugars by looking at
the residues that interact with the protein in both cases? (Hint: Look at the ligand
interactions for both MUC in 1KWW and FUC in 3KMB by looking at atlas pages for
both these entries).
Answer: