proteomics and annotation. definition of proteomics study of all the proteins in an organism derived...

Post on 01-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Proteomics and annotation

Definition of proteomics

• Study of all the proteins in an organism

• Derived from genomics all the DNA in an organsim

• On some levels it is a catalog of all the functional proteins, but in many contexts it is also the study of the interactions of the proteins

Central Dogma

• DNA --> RNA --> AA --> function

Proteomics techniques

• Protein identification/quanitfication– High throughput elusive

• Now typically – Separate– Isolate– Identify

• Enumerating protein interactions– Protein protein– Protein DNA/RNA

How to separate proteins

• Proteins are made up of 20 AA not 4 NT– DNA size- migration through a charged field– Protein

• Size• Charge• Hydrophobic• Solubility • Fraction of the cell• Much more structure• …

2D gels

Big

Little

3 pH 10 pH

Limitations of 2D

• Very large and small proteins don’t work well• Membrane bound proteins

– Solubility of the protein– Disulfide bonds

• Rare proteins– Can stain with silver stain

» Non-linear» 100X

Mass spectrometry

• Simple principle– Explode the charged peptides off the

sample• Electro-spray: charged cone• Laser -> Vapor -> charged grid

– See how big they are• Detect number of ions/mass

– Ion trap- kind of like TV– TOF- how far did it go

Mass of AA

Mass spectrumActual massMajor Ion +H

C13

Post-translational modification

• Cleavage– removing portions of the protein by

enzymatic action.– Can change location, function, activity

• Additions– Adding a chemical

• Regulated activity• Can change protein function/activity

Modifications

Phosyphorylation Activate/inactivate

Acetylation Stability (histones)

Acylation Membrane assoc.

Glycosylation Signaling

GPI anchor Membrane assoc.

Hydroxyproline Stability

Sulfation P-P interaction

Disulfide Stability

Deamination P-P interaction

Pyroglutamic acid Stability

Ubiquitination Destruction signal

Limitations of mass spec

• Most frequently sequenced protein: keratin– Ionization is not strictly quantitative

• Can cleave the protein into peptides– Complicated by mixtures– Issues on searching the database

• http://prospector.ucsf.edu/• http://fields.scripps.edu/sequest/index.html

QCAT

• Way to quantitatively analyze multiple proteins (Nature Methods 2, 587 - 589 (2005)).

• Depends on concatemers assembled from segments of the proteins of interest. Each protein has one segment that would be produced by a tryptic digest (QCAT)

Cont.

• Grow the peptide in heavy and light isotopes, get standard curve

• Spike your sample with heavy QCAT

• This produces an internal standard for each protein of interest.

• This allows quantitation of many (~100) proteins in one experiment.

Protein-protein interactions

• Types of interactions– Stable

• Multimers, complexes– Association forms complete unit– Quaternary structure

– Unstable• Pathways• Signaling events• Transient interactions

Yeast two-hybrid

How accurate is the Y2H data?

• False Negative– proteins that have very transient interaction,

sporadic interactions or that may be located in the membrane.

– Non-physiological test conditions

• False Positive– Self activators– Weak non-specific interactions– Non-physiological test conditions

How to assess

• Remove proteins with above average number of interactions

• Intersection of a number of experiments (Y2H, Co-IP, and co-expression)

• Network properties.

• Other documented signals of interaction.

Network comparison

• Genome Biology 2006, Volume 7, Issue 11, Article 120

How to find protein/DNA interactions

• Have a typical Transfac binding site 10 bp long with 2 bases somewhat ambiguous. How often does it appear by chance in the genome?

• How can you determine if genes are co-expressed.– DNA foot-printing – Deletion experiements

• High throughput?

ChIP on chip

Design

• Need very specific antibody for each transcription factor that you wish to study

• cDNA will not work with large introns– Whole genome chips– Human 21, 22– 3 x10^6 spots

• SAGE• Look for enriched vs non-enriched

– Looking for a population rather than one sequence

Results

Annotation

• Systematically adding knowledge– Human vs computer

• Throughput• Accuracy• Repeatability

• Typical course– Found in one organism

• Mapped to all other homologous segments– Function as a consequence of sequence

Prosite

• PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites and patterns formulated in such a way that with appropriate computational tools it can rapidly and reliably identify to which known family of protein (if any) the new sequence belongs.

• http://ca.expasy.org/prosite/• Take a smaller segment of the protein and build

up annotation for the whole protein

Structured languages

• The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The project began as a collaboration between three model organism databases, FlyBase external link (Drosophila), the Saccharomyces Genome Database external link (SGD) and the Mouse Genome Database external link (MGD), in 1998. Since then, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes. See the GO Consortium page for a full list of member organizations.

• http://www.geneontology.org/GO.doc.shtml

Other Types

• Systems biology

• Protein structure

• Enzymatic pathways

Kegg API example

• http://sial.org/howto/perl/life-with-cpan/non-root/

Bioperl annotation examples

• Get info from genbank

• Graphical annotation

top related