introduction to bioinformatics online course: ibt 2016 ... · protein structure bioinformatics...
TRANSCRIPT
Protein Structure Bioinformatics
Session1: IntroductionRehab Ahmed
CBSB, Faculty of Science, University of Khartoum
Faculty of Pharmacy, University of Khartoum
Introduction to Bioinformatics online course: IBT_2016
Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed
Learning Objectives
• To recap some basics of amino acids and proteins
• To study the different levels of protein structures
• To shed light on how protein structures are
determined.
• To learn about some relevant databases, file formats
and file viewers.
Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed
Learning Outcomes
By the end of this session and practical, students are
expected to be able to
• Explore some recourses, and tools in the PDB
database.
• Use some webservers to predict Protein secondary
structure
Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed
Structure of Amino Acid
https://www.mun.ca/biology/scarr/iGen3_06-01.html
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Aliphatic R Groups
• Name
• 3 letter
• One letter
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Aromatic R Groups
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Sulfur-containing R Groups
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Side Chains with Polar Alcohol Groups
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Basic R Groups
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Acidic R Groups
http://iweb.langara.bc.ca/biology/mario/Biol2315notes/biol2315chap3.html
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
https://online.science.psu.edu/sites/default/files/biol110/tutorial16_R_groups.jpg
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Molecular interactionsBonds and protein structures
Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed
https://researchpeptides.com/images/misc/peptide-bond-animation.gif
Intermolecular Forces
• Dipole interactions
• Hydrogen bonds
• van der Waals forces
• hydrophobic interactions
• Others.
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
http://www.chem.ucla.edu/~harding/IGOC/D/disulfide_bridge.html
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Intermolecular Forces
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
https://researchpeptides.com/images/misc/Structures-Proteins.jpg
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Structure is instructed in the sequence!!
• Anfinsen's dogma
Christian B. Anfinsen 916–1995, U.S. biochemist: Nobel Prize in Chemistry 1972.
• Principles that Govern the Folding of Protein Chains
• Science 20 Jul 1973:Vol. 181, Issue 4096, pp. 223-230
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
https://online.science.psu.edu/biol011_sandbox_7239/node/7390
Secondary structure
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
α- helix
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
α- helix
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Linus Pauling (1901-1994), Noble prizes in chemistry and peace
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Other types of helices
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
• Alpha helix…….. (I, i+4)
• Others:
-3-10 helix…… (i, i+3)
-π-helix……….. (i, i+5)
https://en.wikipedia.org/wiki/File:Pi-helix_within_an_alpha-helix.jpg
Beta Strands (β-strands)
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Parallel and anti-parallel Beta sheets
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Hairpin
Crossover
Loops/turns
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Motifs in Proteins (Super-Secondary Structure)
• http://swift.cmbi.ru.nl/gv/students/mtom/hmotif.jpg
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Motifs in Proteins (Super-Secondary Structure)
• Psi-loop
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
https://en.wikipedia.org/wiki/File:5CPAgood.png
DSSP (Dictionary of protein secondary structure)
• Criteria for secondary structure.
• Programmed as a pattern-recognition process of hydrogen-bonded and geometrical features extracted from x-ray coordinates.
• Kabsch W, Sander C (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features". Biopolymers. 22 (12): 2577–637
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
DSSP (Helix, Strand and loops)
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Secondary structure Symbols
Alpha helix G
3-10 helix H
π-helix I
Beta bridge B
Beta strand E
Turns T
High curvature S
Space/no rule applies C
DSSP (Dictionary of protein secondary structure)
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Experimental determination of Secondary Structure
• Spectroscopy
• UV CD circular dichroism
• IR Spectroscopy
• NMR
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
http://www.ap-lab.com/images/CD_STANDARDS.gif
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Secondary structure prediction
• Early/empirical methods:
• Probabilities, and pre-computed residues preferences.
• Chou-Fasman method (~60% accurate)• Chou PY, Fasman GD (Jan 1974). "Prediction of protein conformation". Biochemistry. 13 (2): 222–245.
• CFSSP: Chou & Fasman Secondary Structure Prediction Server
• http://www.biogem.org/tool/chou-fasman/
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Secondary structure prediction
• For instance, helical propensity of residue type X
• Pα(X) = frequency (X in helix) / frequency (X)
• Pα > 1 = favours helix (e.g., Pα(Glu)=1.51)
• Pα < 1 = disfavours helix (e.g., Pα(Gly)=0.57)
Gerard J. Kleywegt’s slide
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Secondary structure prediction
• Database of 2000 residues
• 100 are Alanines
• 500 residues are in a helix
• 50 alanines are in a helix
• What is the propensity for Ala to be in a
• helix? Is Ala a good helix former?
Gerard J. Kleywegt’s slide
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Secondary structure prediction
• Pα(X) = frequency (X in helix) / frequency (X)
• Pα (Ala) = freq (Ala, α) / freq (Ala)
• freq (Ala, α) = 50/500 = 0.1
• freq (Ala) = 100/2000 = 0.05
• Pα (Ala) = 0.1/0.05 = 2.0
• Ala is a good helix former!
Gerard J. Kleywegt’s slide
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Secondary structure prediction
• Current, machine learning-based methods
employ information from multiple sequencealignment, information theory, and somemachine learning algorithms like artificial neuralnetwork and Bayesian networks or acombination of those.
• Eg: PSIPRED:
• http://bioinf.cs.ucl.ac.uk/psipred/
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Tertiary structure
• The tertiary structure is the final specificgeometric shape that a protein assumes.
• It is determined by a variety of bondinginteractions between the "side chains" on theamino acids
• Bond involve: hydrogen bonding, salt bridges,disulfide bonds, and non-polar hydrophobicinteractions.
http://chemistry.elmhurst.edu/vchembook/567tertprotein.html
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Methods of 3D structure Determination
Information on 3D structure can be obtained by
• X-ray crystallography,
• NMR spectroscopy, or,
• Cryo-electron microscopy,
submitted by biologists and biochemists from around the world.
freely accessible on the Internet via the websites of its member organizations.
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
X-ray Crystallography
.
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
X-ray Crystallography
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
X-ray Crystallography
• According to the Online Dictionary of Crystallography the term resolution is used to describe the ability to distinguish between neighboring features in an electron density map
• R factor is one measure of model quality (The level of agreement between calculated and observed intensities). (0-0.6)
• >0.5 is considered of poor quality.
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
X-ray Crystallography
Resolution Evaluation Interpretation
1.2 Å Excellent backbone and most side chains very clear. Some hydrogens may be resolved.
2.5 Å Good backbone and many side chains clear
3.5 Å OK! backbone and bulky side chains
5.0 Å Poor!!! backbone mostly clear; side chains not clear.
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
http://proteopedia.org/wiki/index.php/Resolution
Databases
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
wwPDB
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
RCSB PDB
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
• Repository of information about the 3D structures of large biological molecules.
• Was established in 1971 at Brookhaven National Laboratory
• Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB in 1998
RCSB PDB
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
RCSB PDB
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
PDB ID(s)
• A 4-character ID eg: 8CAT
• Unique, immutable identifier.
• The IDs are automatically assigned and do not
have meaning.
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Domains
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
• The domain is the basic building block of a protein structure
• 1- A spatially separated unit of the protein structure
• 2- May have sequence and/or structural resemblance to another protein structure or domain.
• 3- May have a specific function associated with it.
http://www.proteinstructures.com/Structure/Structure/protein-domains.html
Pfam
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
• Pfam 30.0
• 16306 entries (06.2016).
• Information about protein families (HMM)
• Annotations.
• links to other databases: RCSB PDB, CATH, SCOP, Proteopedia..etc
Pfam
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Pfam
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Pfam
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Pfam
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Pfam
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Pfam
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
CATH
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
The domains are classified within the CATH structural hierarchy: • Class (C) level, classification based on secondary
structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure;
• Architecture (A) level, the level based on arrangement in three-dimensional space.
• Topology/fold (T) level, how the secondary structure elements are connected and arranged.
• Homologous superfamily (H) level, assignments are made if there is good evidence that the domains are related by evolution, i.e. they are homologous.
• http://www.cathdb.info/wiki
CATH
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
CATH v4.1
PDB Release 01-01-2015
Domains 308999
Superfamilies 2737
Annotated PDBs 108378
CATH
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Proteopedia
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
• Wiki web-resource whose pages have embedded three-dimensional structures surrounded by descriptive
• http://proteopedia.org/wiki/index.php/Main_Page
Proteopedia
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
File formats
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
• Sequence file; FASTA
• Secondary Structure Files(FASTA-formatted file ("ss.txt").
• PDB entry files (PDB, PDBx/mmCIF, XML).
• Small Molecule Files (PDB, CIF, SDF,..)
• Large Structures Represented in mmCIF/PDBx(containing >62 chains and/or 99999 ATOM records)
FASTA-formatted file ("ss.txt")
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed
• >101M:A:sequence
• MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRVKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG
• >101M:A:secstr
• HHHHHHHHHHHHHHGGGHHHHHHHHHHHHHHH GGGGGG TTTTT SHHHHHH HHHHHHHHHHHHHHHHHHTTTT HHHHHHHHHHHHHTS HHHHHHHHHHHHHHHHHH GGG SHHHHHHHHHHHHHHHHHHHHHHHHTT >102L:A:sequenceMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
• >102L:A:secstr HHHHHHHHH EEEEEE TTS EEEETTEEEESSS TTTHHHHHHHHHHTS TTB HHHHHHHHHHHHHHHHHHHHH TTHHHHHHHS HHHHHHHHHHHHHHHHHHHHT HHHHHHHHTT HHHHHHHHHSSHHHHHSHHHHHHHHHHHHHSSSGGG
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed
PDB File formats
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed
Molecular Graphics Software
• Cn3D http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
• iCn3D http://www.ncbi.nlm.nih.gov/Structure/icn3d/docs/icn3d_about.html
• UCSF Chimera http://www.cgl.ucsf.edu/chimera/index.html
• Visual molecular dynamics (VMD) http://www.ks.uiuc.edu/Research/vmd/
• PyMOL https://www.pymol.org/
• Etc…
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Molecular Representation
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed
• What do we mean by Structural
bioinformatics?
• Why Protein Structure Bioinformatics?
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
?????????
• Structural Bioinformatics is a branch of
bioinformatics that deals with structure of the
biological macromolecules; DNA, RNA and
Proteins... (Deal=analysis, storage, visualization,
prediction…etc)
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Structural bioinformatics
• Proteins are the building blocks of all cells.
• In the world of proteins; Structure= Function!?
• DNA encodes life..Yes! But proteins carry out life
processes, replication, reproduction, defense…etc!
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Why Protein Structure bioinformatics
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Why Protein Structure bioinformatics
• This first SB session is meant to cover some basics and fundamentals and to help make us all be at the same page
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed
Resources/References
• The Anatomy and Taxonomy of Protein Structure(By: Jane S.
Richardson)
http://kinemage.biochem.duke.edu/teaching/anatax/
• http://www.rcsb.org/
• http://sbkb.org/
• http://www.proteinstructures.com/index.html
• http://proteopedia.org/wiki/index.php/Main_Page
Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed