proteomics-introduction.ppt

Defining Proteomics

• Branch of discovery science focusing on proteins• In 1994 defined as “the complete set of proteins that is

expressed and modified following expression by the entire genome in the lifetime of a cell”. If we look at an organism it means that we are looking at the proteome of 3 trillion of cells and ~1000 different cell types with different protein profiles.

• Can be more specific such as the complement of proteins expressed by a cell at any one time.

• Today proteomics is a scientific discipline that will bridge the gap between our understanding of genome sequences and cellular behavior.

Genomics and Proteomics a new field with a new vocabulary

-Omics: means area of research

DNA

RNA

proteins

Metabolites

Protein-protein, Protein-DNA, Protein-RNA

interactions

Genome

Transcriptome

Proteome

Metabolome

Interactome

Genomics

Transcriptomics

Proteomics

Metabolomics

Functional genomics

global Targeted

global Targeted

global Targeted

global Targeted

Interactomics System Biology

Gene nameInteraction

Knowledge from proteomics studies is limited by our inability to analyze efficiently large data sets

• Proteomics studies highlight the extreme complexity of interactions in a genomic scale.

• Proteomics is facing the challenge of analyzing large and highly complex and very noisy data sets.

• Bioinformatics is integrated in proteomics projects to mine data and is becoming more and more important.

Proteomics

– Makes use of the science of protein to develop highthroughput technologies to study the whole proteome.

– Proteomics combined with micro-array technology and bioinformatics can explore system biology and is becoming more and more powerful.

– Proteomics is currently directed towards protein profiling, and protein discovery

– Proteomics can solve important biological mechanism in combination with other methods such a Molecular and Cell Biology, Genetics,etc…

How does proteomics help to Identify genes involved in important diseases?

An example in Human Genetics

Genomics Databases contains the information to identify the candidate genes involved in human

diseases

SNPsSingle Nucleotide

Polymorphism

NCBINational Center for biotechnology

information

More than one candidate gene

Analysis of Genomics, Microarrays gene expression and proteomics data contained in

public databases can identify the gene involved in a particular human disease

Only one candidate gene

Computer Search

2D gel

MicroarrayGene expression data

Disease Gene Identifiedwith mutations

Automated DNA sequence

In the 1970 the effort to sequence the DNA by Gilbert and Sanger leads to the decoding of DNA of a few hundred bases long.

The first sequence in 1978 of a viral genome of 5000 base pairs highlights the unique insights that can be obtained into gene structure, function and genome organization when a vast amount of genetic information is generated by sequencing.

In 1985 Gilbert and others launched the genomic area by improving the existing DNA sequencing technology towards intensive automation

In 1998 full automation were obtained for an integrated machine that could produce factory-like DNA sequences

The latest sequencing machines can decode 1.5 million of bases over 24 hours, 6000 time the throughput of the prototype

• .

The Human Genome Project• Started officially in 1990, but followed discussion about the DNA sequencing

technologies started in 1985.• Objective was to obtained the genome in 15 years• In 2001 two versions of the draft constituted of 3 billions of bases were

available by the biotech company celera and the human genome sequencing consortium.

• In the process tools and methodology were obtained to sequence other genomes 100 genome to date.

• The entire RNA and protein output encoded by the genome can be made available in public databases to facilitate hypothesis driven science and global analysis.

• The HGP pushed the development of highthroughput tools for sequencing which are currently driving the creation of other methodologies related to gene expression such as micorarray and proteomics such as mass spectrometry for the analysis of other related biological information, such as RNA, proteins and molecular interactions.

Digital Nature of Biological Information

• The value of genome sequence is that we can study a biological system with a precise digital core of information.

• The challenge is to find which information is encoded within the digital code.

• The genome encode the protein and RNA machine of life and the regulatory network that specify how these genes are expressed in time, space and amplitude.

• The evolution of the regulatory network and not the genes themselves play a critical role in making organism different from one another.


The digital information operates in three diverse time spans:Evolution: tens to million yearsDevelopment: hours to ten of yearsPhysiology: milliseconds to weeks

– Regulatory network are composed of two components: Transcription factors and their DNA sites representing control regions of genes.

– Control regions serves as information processor to control the concentration of different transcription factors into signals that mediate gene expression to carry out developmental or physiological functions.


– Biology has evolved several different types of information into a hierarchical structure.

– First a regulatory hierarchy of gene network defining the relationship of a set of transcription factors and regulatory elements controlling particular aspect of development

– Second an evolutionary hierarchy defining an ordered sets of relationship arising from the duplication of genes. For example the Duplication of a gene to generate a gene family.

– Third Molecular machine may be assembled into structural hierarchies by an ordered assembly process. The ribosome is assembled by more than 50 different proteins

– Finally informational theory describe the flow of a gene to environment according to the following scheme:


Informational theory describe the flow of a gene to environment according to the following scheme:

RNA

Gene

Protein

Protein interactions

Protein complexes

Network of protein complexes

Tissue Organs

Ecosystem

Organism

Systems approaches to biology• Human starts as a single cell get fertilized and develop into an adult made

of trillion of cells and thousands of cell types.

• During this process two type of digital information are used.– genome inoformation– environmental information such as:

• metabolite concentration• Secreted or cell surface signals from other cells, chemical agents,

etc….• Information can be predetermined deterministic or random

stochastics

Example: Antibody diversity is generated by stochastic signal following the exposure to an Antigen. Expansion in number of B cells secreting antibody is directly related to the affinity of the antigen to the antibody. Higher the affinity of the antibody to the antigen is higher the cells producing this antibody will be selected for survival and proliferation

16

What is Proteomics?

• Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies to display, identify and/or characterize all the proteins in a given cell, tissue or organism (I.e. the proteome).

17

3 Kinds of Proteomics• Expressional Proteomics

– Electrophoresis, Protein Chips, DNA Chips, SAGE– Mass Spectrometry, Microsequencing

• Functional Proteomics– HT Functional Assays, Ligand Chips– Yeast 2-hybrid, Deletion Analysis, Motif Analysis

• Structural Proteomics– High throughput X-ray Crystallography/Modelling– High throughput NMR Spectroscopy/Modelling

18

Expressional Proteomics

2-D Gel 2-D Gel QTOF Mass SpectrometryQTOF Mass Spectrometry

19


Prostate tumor Normal

20


21

Why Expressional Proteomics?• Concerned with the display, measurement and

analysis of global changes in protein expression

• Monitors global changes arising from application of drugs, pathogens or toxins

• Monitors changes arising from developmental, environmental or disease perturbations

• Applications in medical diagnostics and therapeutic drug monitoring

22

Functional Proteomics

23

Functional Proteomics (in silico)

AHGQSDFILDEADGMMKSTVPN…AHGQSDFILDEADGMMKSTVPN… HGFDSAAVLDEADHILQWERTY…HGFDSAAVLDEADHILQWERTY… GGGNDEYIVDEADSVIASDFGH…GGGNDEYIVDEADSVIASDFGH…

*[LIVM][LIVM]DEAD*[LIVM][LIVM]**[LIVM][LIVM]DEAD*[LIVM][LIVM]*

(EIF 4A ATP DEPENDENT HELICASE)(EIF 4A ATP DEPENDENT HELICASE)

24

Functional Proteomics (in vitro)

• Multi-well plate readersMulti-well plate readers• Full automation/roboticsFull automation/robotics• Fluorescent and/or chemi-Fluorescent and/or chemi-

luminescent detectionluminescent detection• Small volumes (Small volumes (L)L)• Up to 1536 wells/plateUp to 1536 wells/plate• Up to 200,000 tests/dayUp to 200,000 tests/day• Mbytes of data/dayMbytes of data/day

25


26


• In silico methods (bioinformatics)

• Genome-wide Protein Tagging

• Genome-wide Gene Deletion or Knockouts

• Random Tagged Mutagenisis or Transposon Insertion

• Yeast two-hybrid Methods

• Protein (Ligand) Chips

27

Why Functional Proteomics?• Concerned with the identification and

classification of protein functions, activities locations and interactions at a global level

• To compare organisms at a global level so as to extract phylogenetic information

• To understand the network of interactions that take place in a cell at a molecular level

• To predict the phenotypic response of a cell or organism to perturbations or mutations

28

From Genotype to Phenotype

29

Structural Proteomics

• High Throughput protein structure determination via X-ray crystallography, NMR spectroscopy or comparative molecular modeling

30

Structural Proteomics:The Goal

31

Structural Proteomics: The Motivation

0200000400000600000800000

100000012000001400000160000018000002000000

1980 1985 1990 1995 2000 2005

2000040000

6000080000

100000120000

140000160000

0

Seq

uenc

esS

eque

nces Structures

Structures

180000200000

32

The Protein Fold Universe

HowBigIsIt???

500?

2000?

10000?

8 ?

33

Protein Structure Initiative

• Organize all known protein sequences into sequence families

• Select family representatives as targets

• Solve the 3D structures of these targets by X-ray or NMR

• Build models for the remaining proteins via comparative (homology) modeling

34

Protein Structure Initiative

• Organize and recruit interested structural biologists and structure biology centres from around the world

• Coordinate target selection

• Develop new kinds of high throughput techniques

• Solve, solve, solve, solve….

35

Why Structural Proteomics?

• Structure Function

• Structure Mechanism

• Structure-based Drug Design

• Solving the Protein Folding Problem

• Keeps Structural Biologists Employed

36

Bioinformatics & Proteomics

ProteomicsProteomics GenomicsGenomics

MedicineMedicine

BioinformaticsBioinformatics

AgricultureAgriculture

37

Bioinformatics & Functional Proteomics

• How to classify proteins into functional classes?

• How to compare one proteome with another?• How to include functional/activity/pathway

information in databases?• How to extract functional motifs from

sequence data?• How to predict phenotype from proteotype?

38

Bioinformatics & Expressional Proteomics

• How to correlate changes in protein expression with disease?

• How to distinguish important from unimportant changes in expression?

• How to compare, archive, retrieve gel data?

• How to rapidly, accurately identify proteins from MS and 2D gel data?

• How to include expression info in databases?

39

Bioinformatics & Structural Proteomics

• How to predict 3D structure from 1D sequence?• How to determine function from structure?• How to classify proteins on basis of structure?• How to recognize 3D motifs and patterns?• How to use bioinformatics databases to help in

3D structure determination?• How to predict which proteins will express well

or produce stable, folded molecules?

proteomics-introduction.ppt

Documents