bioinformatics master course: dna/protein structure-function analysis and prediction
Post on 13-Jan-2016
35 Views
Preview:
DESCRIPTION
TRANSCRIPT
CENTR
FORINTEGRATIVE
BIOINFORMATICSVU
EBioinformatics Master Course:DNA/Protein Structure-Function Analysis and Prediction
Lecture 13: Protein Function
[2] [2] [2]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Sequence
Structure
Function
Threading
Homology searching (BLAST)
Ab initio prediction and folding
Function prediction from structure
Sequence-Structure-Function
impossible but for the smallest structures
very difficult
[3] [3] [3]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)
Genome
Expressome
Proteome
Metabolome
Functional Genomics – Systems Functional Genomics – Systems BiologyBiology
Metabolomics
fluxomics
[4] [4] [4]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Systems Biology
is the study of the interactions between the components of a biological system, and how these interactions give rise to the function and behaviour of that system (for example, the enzymes and metabolites in a metabolic pathway). The aim is to quantitatively understand the system and to be able to predict the system’s time processes
• the interactions are nonlinear• the interactions give rise to emergent properties, i.e.
properties that cannot be explained by the components in the system
• Biological processes include many time-scales, many compartments and many interconnected network levels (e.g. regulation, signalling, expression,..)
[5] [5] [5]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Systems Biology
understanding is often achieved through modeling and simulation of the system’s components and interactions.
Many times, the ‘four Ms’ cycle is adopted:
Measuring
Mining
Modeling
Manipulating
[6] [6] [6]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
‘The silicon cell’
(some people think ‘silly-con’ cell)
[7] [7] [7]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
[8] [8] [8]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
A system response
Apoptosis: programmed cell death
Necrosis: accidental cell death
[9] [9] [9]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
This pathway diagram shows a comparison of pathways in (left) Homo sapiens (human) and (right) Saccharomyces cerevisiae (baker’s yeast). Changes in controlling enzymes (square boxes in red) and the pathway itself have occurred (yeast has one altered (‘overtaking’) path in the graph)
We need to be able to do automatic pathway comparison (pathway alignment)
Human Yeast
‘Comparative metabolomics’
[10] [10] [10]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
The citric-acid cycle
http://en.wikipedia.org/wiki/Krebs_cycle
[11] [11] [11]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
The citric-acid cycleFig. 1. (a) A graphical representation of the reactions of the citric-acid cycle (CAC), including the connections with pyruvate and phosphoenolpyruvate, and the glyoxylate shunt. When there are two enzymes that are not homologous to each other but that catalyse the same reaction (non-homologous gene displacement), one is marked with a solid line and the other with a dashed line. The oxidative direction is clockwise. The enzymes with their EC numbers are as follows: 1, citrate synthase (4.1.3.7); 2, aconitase (4.2.1.3); 3, isocitrate dehydrogenase (1.1.1.42); 4, 2-ketoglutarate dehydrogenase (solid line; 1.2.4.2 and 2.3.1.61) and 2-ketoglutarate ferredoxin oxidoreductase (dashed line; 1.2.7.3); 5, succinyl- CoA synthetase (solid line; 6.2.1.5) or succinyl-CoA–acetoacetate-CoA transferase (dashed line; 2.8.3.5); 6, succinate dehydrogenase or fumarate reductase (1.3.99.1); 7, fumarase (4.2.1.2) class I (dashed line) and class II (solid line); 8, bacterial-type malate dehydrogenase (solid line) or archaeal-type malate dehydrogenase (dashed line) (1.1.1.37); 9, isocitrate lyase (4.1.3.1); 10, malate synthase (4.1.3.2); 11, phosphoenolpyruvate carboxykinase (4.1.1.49) or phosphoenolpyruvate carboxylase (4.1.1.32); 12, malic enzyme (1.1.1.40 or 1.1.1.38); 13, pyruvate carboxylase or oxaloacetate decarboxylase (6.4.1.1); 14, pyruvate dehydrogenase (solid line; 1.2.4.1 and 2.3.1.12) and pyruvate ferredoxin oxidoreductase (dashed line; 1.2.7.1).
M. A. Huynen, T. Dandekar and P. Bork ``Variation and evolution of the citric acid cycle: a genomic approach'' Trends Microbiol, 7, 281-29 (1999)
[12] [12] [12]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
The citric-acid cycle
M. A. Huynen, T. Dandekar and P. Bork ``Variation and evolution of the citric acid cycle: a genomic approach'' Trends Microbiol, 7, 281-29 (1999)
b) Individual species might not have a complete CAC. This diagram shows the genes for the CAC for each unicellular species for which a genome sequence has been published, together with the phylogeny of the species. The distance-based phylogeny was constructed using the fraction of genes shared between genomes as a similarity criterion29. The major kingdoms of life are indicated in red (Archaea), blue (Bacteria) and yellow (Eukarya). Question marks represent reactions for which there is biochemical evidence in the species itself or in a related species but for which no genes could be found. Genes that lie in a single operon are shown in the same color. Genes were assumed to be located in a single operon when they were transcribed in the same direction and the stretches of non-coding DNA separating them were less than 50 nucleotides in length.
[13] [13] [13]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Experimental• Structural genomics
• Functional genomics
• Protein-protein interaction
• Metabolic pathways
• Expression data
[14] [14] [14]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Communicability: Functional Genomics• Interpretation of genome-scale gene expression data
External ProgramDNA-chip data
Cluster of coregulated genes gene 1 gene 2 ... gene n
PFMP query
Pathways affected pathway 1 pathway 2
[15] [15] [15]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Communicability: Functional Genomics• Interpretation of genome-scale gene expression data
External ProgramsDNA-chip data
Cluster of coregulated genes gene 1 gene 2 ... gene n
PFMP query
Similarities with known regulatory sites site 1 Factor 1 site 2 Factor 2 ...
Pattern discovery gene 1 gene 2 ...(putative regulatory sites)
[16] [16] [16]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Other Issues• Partial information (indirect interactions) and
subsequent filling of the missing steps
• Negative results (elements that have been shown not to interact, enzymes missing in an organism)
• Putative interactions resulting from computational analyses
[17] [17] [17]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Protein function categories• Catalysis (enzymes)
• Binding – transport (active/passive)
• Protein-DNA/RNA binding (e.g. histones, transcription factors)
• Protein-protein interactions (e.g. antibody-lysozyme) (experimentally determined by yeast two-hybrid (Y2H) or bacterial two-hybrid (B2H) screening )
• Protein-fatty acid binding (e.g. apolipoproteins)
• Protein – small molecules (drug interaction, structure decoding)
• Structural component (e.g. -crystallin)
• Regulation
• Signalling
• Transcription regulation
• Immune system
• Motor proteins (actin/myosin)
[18] [18] [18]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Catalytic properties of enzymes
[S]
Mol
es/s
Vmax
Vmax/2
Km
Michaelis-Menten equation:
Km kcat
E + S ES E + P• E = enzyme• S = substrate• ES = enzyme-substrate complex (transition state)• P = product
• Km = Michaelis constant
• Kcat = catalytic rate constant (turnover number)
• Kcat/Km = specificity constant (useful for comparison)
Vmax × [S]V = ------------------- Km + [S]
[19] [19] [19]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Protein interaction domains
http://pawsonlab.mshri.on.ca/html/domains.html
[20] [20] [20]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Energy difference upon binding
Examples of protein interactions (and functional importance) include:
• Protein – protein (pathway analysis);
• Protein – small molecules (drug interaction, structure decoding);
• Protein – peptides, DNA/RNA (function analysis)
The change in Gibb’s Free Energy of the protein-ligand binding interaction can be monitored and expressed by the following;
G = H – T S (H=Enthalpy, S=Entropy and T=Temperature)
[21] [21] [21]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Experimentally measuring PPIsYeast two-hybrid
Bait – TF binding domain
Prey – Activation domain
TF: DNA binding and activation domain together set transcription in motion
Yeast strains of opposite mating types
Make yeast strains mate and have an easily observable reporter gene (e.g. luciferase) with appropriate TFBS
Bait and Prey have to interact to activate reporter gene
[22] [22] [22]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Experimentally measuring PPIs
Tandem affinity purification (TAP)
• Add TAP tag at end of target gene containing an IgG domain
• Separate protein-TAP-IgG complexes using affinity column containing IgG beads
• Wash off the column, target-IgG complex stays behind
• If target protein interacts with others, these are also retained on the column
• Separate proteins using SDS-PAGE and identify using mass-spec
• Can also use other protein in complex as target protein to verify complex formation
[23] [23] [23]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Protein function • Many proteins combine functions
• For example, some immunoglobulin structures are thought to have more than 100 different functions (and active/binding sites)
• Alternative splicing can generate (partially) alternative structures
[24] [24] [24]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Protein function & Interaction
Active site / binding cleft
Shape complementarity
[25] [25] [25]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Protein function evolution
Chymotrypsin
[26] [26] [26]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
How to infer function• Experiment
• Deduction from sequence• Multiple sequence alignment – conservation patterns• Homology searching
• Deduction from structure• Threading• Structure-structure comparison• Homology modelling
[27] [27] [27]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Cholesterol Biosynthesis:
Cholesterol biosynthesis primarily occurs in eukaryotic cells. It is necessary for membrane synthesis, and is a precursor for steroid hormone production as well as for vitamin D. While the pathway had previously been assumed to be localized in the cytosol and ER, more recent evidence suggests that a good deal of the enzymes in the pathway exist largely, if not exclusively, in the peroxisome (the enzymes listed in blue in the pathway to the left are thought to be at least partly peroxisomal). Patients with peroxisome biogenesis disorders (PBDs) have a variable deficiency in cholesterol biosynthesis
[28] [28] [28]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
EMevalonate plays a role in epithelial cancers: it can inhibit EGFR
Cholesterol Biosynthesis: from acetyl-Coa to mevalonate
[29] [29] [29]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Epidermal Growth Factor as a Clinical Target in CancerA malignant tumour is the product of uncontrolled cell proliferation. Cell growth is controlled by a delicate balance between growth-promoting and growth-inhibiting factors. In normal tissue the production and activity of these factors results in differentiated cells growing in a controlled and regulated manner that maintains the normal integrity and functioning of the organ. The malignant cell has evaded this control; the natural balance is disturbed (via a variety of mechanisms) and unregulated, aberrant cell growth occurs. A key driver for growth is the epidermal growth factor (EGF) and the receptor for EGF (the EGFR) has been implicated in the development and progression of a number of human solid tumours including those of the lung, breast, prostate, colon, ovary, head and neck.
[30] [30] [30]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Energy housekeeping:Adenosine diphosphate (ADP) – Adenosine triphosphate (ATP)
[31] [31] [31]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Chemical Reaction
[32] [32] [32]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Enzymatic Catalysis
[33] [33] [33]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Gene Expression
[34] [34] [34]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Inhibition
[35] [35] [35]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Metabolic Pathway: Proline Biosynthesis
[36] [36] [36]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Transcriptional Regulation
[37] [37] [37]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Methionine Biosynthesis in E. coli
[38] [38] [38]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Shortcut Representation
[39] [39] [39]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
High-level Interaction
[40] [40] [40]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Levels of Resolution
[41] [41] [41]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Cholesterol Biosynthesis
[42] [42] [42]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
SREBP Pathway
[43] [43] [43]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Signal Transduction
Important signalling pathways: Map-kinase (MapK) signalling pathway, or TGF- pathway
[44] [44] [44]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Transport
[45] [45] [45]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Phosphate Utilization in Yeast
[46] [46] [46]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Multiple Levels of Regulation• Gene expression
• Protein activity
• Protein intracellular location
• Protein degradation
• Substrate transport
[47] [47] [47]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Graphical Representation – Gene Expression
[48] [48] [48]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Experimental Data – Gene Expression
[49] [49] [49]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Experimental Data – Transcriptional Regulation
[50] [50] [50]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Experimental Data – Transcriptional Regulation
[51] [51] [51]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Transcriptional RegulationIntegrated View
[52] [52] [52]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Pathways and Pathway Diagrams• Pathways
• Set of nodes (entities) and edges (associations)
• Pathway Diagrams
• XY coordinates
• Node splitting allowed
• Multiple views of the same pathway
• Different abstraction levels
[53] [53] [53]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Kegg database (Japan)
Metabolic Metabolic networksnetworks
Glycolysis Glycolysis and and
GluconeogenesisGluconeogenesis
[54] [54] [54]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Gene Ontology (GO)• Not a genome sequence database
• Developing three structured, controlled vocabularies (ontologies) to describe gene products in terms of:
• biological process
• cellular component
• molecular function
in a species-independent manner
[55] [55] [55]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
The GO ontology
[56] [56] [56]
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Gene Ontology Members• FlyBase - database for the fruitfly Drosophila melanogaster • Berkeley Drosophila Genome Project (BDGP) - Drosophila informatics; GO database & software, Sequence Ontology development
• Saccharomyces Genome Database (SGD) - database for the budding yeast Saccharomyces cerevisiae • Mouse Genome Database (MGD) & Gene Expression Database (GXD) - databases for the mouse Mus musculus
• The Arabidopsis Information Resource (TAIR) - database for the brassica family plant Arabidopsis thaliana
• WormBase - database for the nematode Caenorhabditis elegans • EBI GOA project : annotation of UniProt (Swiss-Prot/TrEMBL/PIR) and InterPro databases • Rat Genome Database (RGD) - database for the rat Rattus norvegicus • DictyBase - informatics resource for the slime mold Dictyostelium discoideum • GeneDB S. pombe - database for the fission yeast Schizosaccharomyces pombe (part of the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute)
• GeneDB for protozoa - databases for Plasmodium falciparum, Leishmania major, Trypanosoma brucei, and several other protozoan parasites (part of the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute)
• Genome Knowledge Base (GK) - a collaboration between Cold Spring Harbor Laboratory and EBI) • TIGR - The Institute for Genomic Research • Gramene - A Comparative Mapping Resource for Monocots • Compugen (with its Internet Research Engine) • The Zebrafish Information Network (ZFIN) - reference datasets and information on Danio rerio
top related