proteomics-introduction.ppt
TRANSCRIPT
Defining Proteomics
• Branch of discovery science focusing on proteins• In 1994 defined as “the complete set of proteins that is
expressed and modified following expression by the entire genome in the lifetime of a cell”. If we look at an organism it means that we are looking at the proteome of 3 trillion of cells and ~1000 different cell types with different protein profiles.
• Can be more specific such as the complement of proteins expressed by a cell at any one time.
• Today proteomics is a scientific discipline that will bridge the gap between our understanding of genome sequences and cellular behavior.
Genomics and Proteomics a new field with a new vocabulary
-Omics: means area of research
DNA
RNA
proteins
Metabolites
Protein-protein, Protein-DNA, Protein-RNA
interactions
Genome
Transcriptome
Proteome
Metabolome
Interactome
Genomics
Transcriptomics
Proteomics
Metabolomics
Functional genomics
global Targeted
global Targeted
global Targeted
global Targeted
Interactomics System Biology
Gene nameInteraction
Knowledge from proteomics studies is limited by our inability to analyze efficiently large data sets
• Proteomics studies highlight the extreme complexity of interactions in a genomic scale.
• Proteomics is facing the challenge of analyzing large and highly complex and very noisy data sets.
• Bioinformatics is integrated in proteomics projects to mine data and is becoming more and more important.
Proteomics
– Makes use of the science of protein to develop highthroughput technologies to study the whole proteome.
– Proteomics combined with micro-array technology and bioinformatics can explore system biology and is becoming more and more powerful.
– Proteomics is currently directed towards protein profiling, and protein discovery
– Proteomics can solve important biological mechanism in combination with other methods such a Molecular and Cell Biology, Genetics,etc…
How does proteomics help to Identify genes involved in important diseases?
An example in Human Genetics
Genomics Databases contains the information to identify the candidate genes involved in human
diseases
SNPsSingle Nucleotide
Polymorphism
NCBINational Center for biotechnology
information
More than one candidate gene
Analysis of Genomics, Microarrays gene expression and proteomics data contained in
public databases can identify the gene involved in a particular human disease
Only one candidate gene
Computer Search
2D gel
MicroarrayGene expression data
Disease Gene Identifiedwith mutations
Automated DNA sequence
In the 1970 the effort to sequence the DNA by Gilbert and Sanger leads to the decoding of DNA of a few hundred bases long.
The first sequence in 1978 of a viral genome of 5000 base pairs highlights the unique insights that can be obtained into gene structure, function and genome organization when a vast amount of genetic information is generated by sequencing.
In 1985 Gilbert and others launched the genomic area by improving the existing DNA sequencing technology towards intensive automation
In 1998 full automation were obtained for an integrated machine that could produce factory-like DNA sequences
The latest sequencing machines can decode 1.5 million of bases over 24 hours, 6000 time the throughput of the prototype
• .
The Human Genome Project• Started officially in 1990, but followed discussion about the DNA sequencing
technologies started in 1985.• Objective was to obtained the genome in 15 years• In 2001 two versions of the draft constituted of 3 billions of bases were
available by the biotech company celera and the human genome sequencing consortium.
• In the process tools and methodology were obtained to sequence other genomes 100 genome to date.
• The entire RNA and protein output encoded by the genome can be made available in public databases to facilitate hypothesis driven science and global analysis.
• The HGP pushed the development of highthroughput tools for sequencing which are currently driving the creation of other methodologies related to gene expression such as micorarray and proteomics such as mass spectrometry for the analysis of other related biological information, such as RNA, proteins and molecular interactions.
Digital Nature of Biological Information
• The value of genome sequence is that we can study a biological system with a precise digital core of information.
• The challenge is to find which information is encoded within the digital code.
• The genome encode the protein and RNA machine of life and the regulatory network that specify how these genes are expressed in time, space and amplitude.
• The evolution of the regulatory network and not the genes themselves play a critical role in making organism different from one another.
Digital Nature of Biological Information
The digital information operates in three diverse time spans:Evolution: tens to million yearsDevelopment: hours to ten of yearsPhysiology: milliseconds to weeks
– Regulatory network are composed of two components: Transcription factors and their DNA sites representing control regions of genes.
– Control regions serves as information processor to control the concentration of different transcription factors into signals that mediate gene expression to carry out developmental or physiological functions.
Digital Nature of Biological Information
– Biology has evolved several different types of information into a hierarchical structure.
– First a regulatory hierarchy of gene network defining the relationship of a set of transcription factors and regulatory elements controlling particular aspect of development
– Second an evolutionary hierarchy defining an ordered sets of relationship arising from the duplication of genes. For example the Duplication of a gene to generate a gene family.
– Third Molecular machine may be assembled into structural hierarchies by an ordered assembly process. The ribosome is assembled by more than 50 different proteins
– Finally informational theory describe the flow of a gene to environment according to the following scheme:
Digital Nature of Biological Information
Informational theory describe the flow of a gene to environment according to the following scheme:
RNA
Gene
Protein
Protein interactions
Protein complexes
Network of protein complexes
Tissue Organs
Ecosystem
Organism
Systems approaches to biology• Human starts as a single cell get fertilized and develop into an adult made
of trillion of cells and thousands of cell types.
• During this process two type of digital information are used.– genome inoformation– environmental information such as:
• metabolite concentration• Secreted or cell surface signals from other cells, chemical agents,
etc….• Information can be predetermined deterministic or random
stochastics
Example: Antibody diversity is generated by stochastic signal following the exposure to an Antigen. Expansion in number of B cells secreting antibody is directly related to the affinity of the antigen to the antibody. Higher the affinity of the antibody to the antigen is higher the cells producing this antibody will be selected for survival and proliferation
16
What is Proteomics?
• Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies to display, identify and/or characterize all the proteins in a given cell, tissue or organism (I.e. the proteome).
17
3 Kinds of Proteomics• Expressional Proteomics
– Electrophoresis, Protein Chips, DNA Chips, SAGE– Mass Spectrometry, Microsequencing
• Functional Proteomics– HT Functional Assays, Ligand Chips– Yeast 2-hybrid, Deletion Analysis, Motif Analysis
• Structural Proteomics– High throughput X-ray Crystallography/Modelling– High throughput NMR Spectroscopy/Modelling
18
Expressional Proteomics
2-D Gel 2-D Gel QTOF Mass SpectrometryQTOF Mass Spectrometry
19
Expressional Proteomics
Prostate tumor Normal
20
Expressional Proteomics
21
Why Expressional Proteomics?• Concerned with the display, measurement and
analysis of global changes in protein expression
• Monitors global changes arising from application of drugs, pathogens or toxins
• Monitors changes arising from developmental, environmental or disease perturbations
• Applications in medical diagnostics and therapeutic drug monitoring
22
Functional Proteomics
23
Functional Proteomics (in silico)
AHGQSDFILDEADGMMKSTVPN…AHGQSDFILDEADGMMKSTVPN… HGFDSAAVLDEADHILQWERTY…HGFDSAAVLDEADHILQWERTY… GGGNDEYIVDEADSVIASDFGH…GGGNDEYIVDEADSVIASDFGH…
*[LIVM][LIVM]DEAD*[LIVM][LIVM]**[LIVM][LIVM]DEAD*[LIVM][LIVM]*
(EIF 4A ATP DEPENDENT HELICASE)(EIF 4A ATP DEPENDENT HELICASE)
24
Functional Proteomics (in vitro)
• Multi-well plate readersMulti-well plate readers• Full automation/roboticsFull automation/robotics• Fluorescent and/or chemi-Fluorescent and/or chemi-
luminescent detectionluminescent detection• Small volumes (Small volumes (L)L)• Up to 1536 wells/plateUp to 1536 wells/plate• Up to 200,000 tests/dayUp to 200,000 tests/day• Mbytes of data/dayMbytes of data/day
25
Functional Proteomics
26
Functional Proteomics
• In silico methods (bioinformatics)
• Genome-wide Protein Tagging
• Genome-wide Gene Deletion or Knockouts
• Random Tagged Mutagenisis or Transposon Insertion
• Yeast two-hybrid Methods
• Protein (Ligand) Chips
27
Why Functional Proteomics?• Concerned with the identification and
classification of protein functions, activities locations and interactions at a global level
• To compare organisms at a global level so as to extract phylogenetic information
• To understand the network of interactions that take place in a cell at a molecular level
• To predict the phenotypic response of a cell or organism to perturbations or mutations
28
From Genotype to Phenotype
29
Structural Proteomics
• High Throughput protein structure determination via X-ray crystallography, NMR spectroscopy or comparative molecular modeling
30
Structural Proteomics:The Goal
31
Structural Proteomics: The Motivation
0200000400000600000800000
100000012000001400000160000018000002000000
1980 1985 1990 1995 2000 2005
2000040000
6000080000
100000120000
140000160000
0
Seq
uenc
esS
eque
nces Structures
Structures
180000200000
32
The Protein Fold Universe
HowBigIsIt???
500?
2000?
10000?
8 ?
33
Protein Structure Initiative
• Organize all known protein sequences into sequence families
• Select family representatives as targets
• Solve the 3D structures of these targets by X-ray or NMR
• Build models for the remaining proteins via comparative (homology) modeling
34
Protein Structure Initiative
• Organize and recruit interested structural biologists and structure biology centres from around the world
• Coordinate target selection
• Develop new kinds of high throughput techniques
• Solve, solve, solve, solve….
35
Why Structural Proteomics?
• Structure Function
• Structure Mechanism
• Structure-based Drug Design
• Solving the Protein Folding Problem
• Keeps Structural Biologists Employed
36
Bioinformatics & Proteomics
ProteomicsProteomics GenomicsGenomics
MedicineMedicine
BioinformaticsBioinformatics
AgricultureAgriculture
37
Bioinformatics & Functional Proteomics
• How to classify proteins into functional classes?
• How to compare one proteome with another?• How to include functional/activity/pathway
information in databases?• How to extract functional motifs from
sequence data?• How to predict phenotype from proteotype?
38
Bioinformatics & Expressional Proteomics
• How to correlate changes in protein expression with disease?
• How to distinguish important from unimportant changes in expression?
• How to compare, archive, retrieve gel data?
• How to rapidly, accurately identify proteins from MS and 2D gel data?
• How to include expression info in databases?
39
Bioinformatics & Structural Proteomics
• How to predict 3D structure from 1D sequence?• How to determine function from structure?• How to classify proteins on basis of structure?• How to recognize 3D motifs and patterns?• How to use bioinformatics databases to help in
3D structure determination?• How to predict which proteins will express well
or produce stable, folded molecules?