proteomics and annotation. definition of proteomics study of all the proteins in an organism derived...

36
Proteomics and annotation

Upload: vincent-riley

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Proteomics and annotation

Page 2: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Definition of proteomics

• Study of all the proteins in an organism

• Derived from genomics all the DNA in an organsim

• On some levels it is a catalog of all the functional proteins, but in many contexts it is also the study of the interactions of the proteins

Page 3: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Central Dogma

• DNA --> RNA --> AA --> function

Page 4: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Proteomics techniques

• Protein identification/quanitfication– High throughput elusive

• Now typically – Separate– Isolate– Identify

• Enumerating protein interactions– Protein protein– Protein DNA/RNA

Page 5: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

How to separate proteins

• Proteins are made up of 20 AA not 4 NT– DNA size- migration through a charged field– Protein

• Size• Charge• Hydrophobic• Solubility • Fraction of the cell• Much more structure• …

Page 6: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

2D gels

Big

Little

3 pH 10 pH

Page 7: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Limitations of 2D

• Very large and small proteins don’t work well• Membrane bound proteins

– Solubility of the protein– Disulfide bonds

• Rare proteins– Can stain with silver stain

» Non-linear» 100X

Page 8: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Mass spectrometry

• Simple principle– Explode the charged peptides off the

sample• Electro-spray: charged cone• Laser -> Vapor -> charged grid

– See how big they are• Detect number of ions/mass

– Ion trap- kind of like TV– TOF- how far did it go

Page 9: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Mass of AA

Page 10: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Mass spectrumActual massMajor Ion +H

C13

Page 11: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Post-translational modification

• Cleavage– removing portions of the protein by

enzymatic action.– Can change location, function, activity

• Additions– Adding a chemical

• Regulated activity• Can change protein function/activity

Page 12: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Modifications

Phosyphorylation Activate/inactivate

Acetylation Stability (histones)

Acylation Membrane assoc.

Glycosylation Signaling

GPI anchor Membrane assoc.

Hydroxyproline Stability

Sulfation P-P interaction

Disulfide Stability

Deamination P-P interaction

Pyroglutamic acid Stability

Ubiquitination Destruction signal

Page 13: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Limitations of mass spec

• Most frequently sequenced protein: keratin– Ionization is not strictly quantitative

• Can cleave the protein into peptides– Complicated by mixtures– Issues on searching the database

• http://prospector.ucsf.edu/• http://fields.scripps.edu/sequest/index.html

Page 14: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

QCAT

• Way to quantitatively analyze multiple proteins (Nature Methods 2, 587 - 589 (2005)).

• Depends on concatemers assembled from segments of the proteins of interest. Each protein has one segment that would be produced by a tryptic digest (QCAT)

Page 15: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Cont.

• Grow the peptide in heavy and light isotopes, get standard curve

• Spike your sample with heavy QCAT

• This produces an internal standard for each protein of interest.

• This allows quantitation of many (~100) proteins in one experiment.

Page 16: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some
Page 17: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some
Page 18: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Protein-protein interactions

• Types of interactions– Stable

• Multimers, complexes– Association forms complete unit– Quaternary structure

– Unstable• Pathways• Signaling events• Transient interactions

Page 19: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Yeast two-hybrid

Page 20: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

How accurate is the Y2H data?

• False Negative– proteins that have very transient interaction,

sporadic interactions or that may be located in the membrane.

– Non-physiological test conditions

• False Positive– Self activators– Weak non-specific interactions– Non-physiological test conditions

Page 21: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

How to assess

• Remove proteins with above average number of interactions

• Intersection of a number of experiments (Y2H, Co-IP, and co-expression)

• Network properties.

• Other documented signals of interaction.

Page 22: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Network comparison

• Genome Biology 2006, Volume 7, Issue 11, Article 120

Page 23: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some
Page 24: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some
Page 25: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some
Page 26: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some
Page 27: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

How to find protein/DNA interactions

• Have a typical Transfac binding site 10 bp long with 2 bases somewhat ambiguous. How often does it appear by chance in the genome?

• How can you determine if genes are co-expressed.– DNA foot-printing – Deletion experiements

• High throughput?

Page 28: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

ChIP on chip

Page 29: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Design

• Need very specific antibody for each transcription factor that you wish to study

• cDNA will not work with large introns– Whole genome chips– Human 21, 22– 3 x10^6 spots

• SAGE• Look for enriched vs non-enriched

– Looking for a population rather than one sequence

Page 30: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Results

Page 31: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Annotation

• Systematically adding knowledge– Human vs computer

• Throughput• Accuracy• Repeatability

• Typical course– Found in one organism

• Mapped to all other homologous segments– Function as a consequence of sequence

Page 32: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Prosite

• PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites and patterns formulated in such a way that with appropriate computational tools it can rapidly and reliably identify to which known family of protein (if any) the new sequence belongs.

• http://ca.expasy.org/prosite/• Take a smaller segment of the protein and build

up annotation for the whole protein

Page 33: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Structured languages

• The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The project began as a collaboration between three model organism databases, FlyBase external link (Drosophila), the Saccharomyces Genome Database external link (SGD) and the Mouse Genome Database external link (MGD), in 1998. Since then, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes. See the GO Consortium page for a full list of member organizations.

• http://www.geneontology.org/GO.doc.shtml

Page 34: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Other Types

• Systems biology

• Protein structure

• Enzymatic pathways

Page 35: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Kegg API example

• http://sial.org/howto/perl/life-with-cpan/non-root/

Page 36: Proteomics and annotation. Definition of proteomics Study of all the proteins in an organism Derived from genomics all the DNA in an organsim On some

Bioperl annotation examples

• Get info from genbank

• Graphical annotation