correlating mrna and protein abundance via genomic and proteomic characteristics dov greenbaum...
TRANSCRIPT
![Page 1: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/1.jpg)
![Page 2: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/2.jpg)
Correlating mRNA and protein abundance via genomic and proteomic characteristics
Dov Greenbaum
Gerstein LabThesis Seminar
April 21, 2004
![Page 3: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/3.jpg)
outline
Why analyze mRNA and protein correlationsBackground
Disparate Data Sources Correlating mRNA and Protein
ResultsOther analysesFormalism – comparing genome, transcriptome and proteome in terms of broad categories
New Data SetsAnalysis via Broad CategoriesAnalysis of factors affecting correlations
Another reason to expect correlations Expression and Protein Interactions
![Page 4: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/4.jpg)
Why Correlate mRNA & Protein?
0500
100015002000250030003500400045005000
mRNA Protein
Experiments
![Page 5: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/5.jpg)
Both mRNA and Protein Levels are necessary for complete analysis
Combinations of RNA and protein detection approaches have recently aided in theidentification of biomarkers in cancer Hegde et al Current Opinion in Biotech 2003
Shown mathematically in Hatzimanikatis et al Biotechnology 1999
![Page 6: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/6.jpg)
Relationship between mRNA and Protein levels
dPi
dt= ks;i * mRNAi - kd;i Pi
where ks,i and kd,i are the protein synthesis and degradationrate constants, respectively,
At steady state: Pi =ks;i * mRNAi
kdi
![Page 7: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/7.jpg)
Methods for determining mRNA expressionEach have Strengths and Weaknesses
![Page 8: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/8.jpg)
Methods for determining protein abundance
2DE Gel Electrophoresis– (Klose, 1975; O’Farrell, 1975)• Multiple staining options• Small dynamic range• limited in what it can detect
![Page 9: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/9.jpg)
Methods for determining protein abundance
ICAT– ICAT reagent-- relative
levels– VB dynamic range– Cannot detect post-
translational modifications– it require proteins to contain
cysteine residues, & these residues must be in the region of a peptide that is produced during proteolytic
cleavage
![Page 10: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/10.jpg)
MudPit
Really only HT that candetect PT modifications
![Page 11: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/11.jpg)
Other Methods for determining protein abundance
DIGE– e.g. Cy3 vs cy5
labeling– Very big dynamic
range
2D-electrophoresis
Tap Tagging Weissman & O’Shea(Oct 2003)
![Page 12: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/12.jpg)
Other Methods for determining protein abundance
020000
4000060000
80000
2DE
DIG
ICA
TM
PT
apA
ffyMax
01000
20003000
4000
2DE
DIG
ICA
TM
PT
AP
Max Prot
![Page 13: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/13.jpg)
Same mRNA levels yet protein data varied > 20X
N ~100, r = 0.9
Protein Quantification via measurement of radioactivity
Gygi et al Molecular and Cellular Biology,1999.
![Page 14: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/14.jpg)
Same mRNA levels yet protein data varied > 20X
Do some ORFs bias the results?
73 proteins (69%) R = 0.356
![Page 15: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/15.jpg)
mRNA vs Proteinr = 0.74
Protein Quantification via image analysis
Futcher et al Molecular and Cellular Biology, 1999
![Page 16: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/16.jpg)
Jury is out…
Gygi et al: “This study revealed that transcript levels provide little predictive value with respect to the extent of protein expression.”
Futcher et al: “there is a good correlation between protein abundance and mRNA
abundance for the proteins that we have studied”.
![Page 17: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/17.jpg)
mRNA vs Protein
r =0.67
Greenbaum et al Bioinformatics 2001
![Page 18: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/18.jpg)
3 Genes in Lung AdenocarcinomasOp18, Annexin IV, and GAPD r = 0.025
Chen et al Molecular & Cellular Proteomics, 2002.
![Page 19: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/19.jpg)
murine hematopoietic precursor MPROchange in expression 0 - 72 hr
![Page 20: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/20.jpg)
murine hematopoietic precursor MPROchange in expression 0 - 72 hr
R = 0.58~ 80% of the genes are located in the first and third quadrants
![Page 21: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/21.jpg)
Ratios of wt+gal to wt gal ICAT vs microarray
N ~ 290, r = 0.6
Ideker et al Science, 2001
![Page 22: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/22.jpg)
Yeast growth under two different mediar = 0.45 but almost 1.0 for same loci in same pathway
Washburn et al PNAS 2003
![Page 23: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/23.jpg)
Integrating multiple sources of Information
The challenge for computational biology is to provide methodologies for transforming high-throughput heterogeneous data sets into biological insights about the underlying mechanisms. Although high-throughput assays provide a global picture, the details are often noisy, hence conclusions should be supported by several types of observations. Integration Integration of data from assays that examine cellular of data from assays that examine cellular systems from different viewpointssystems from different viewpoints (for instance, gene expression and protein-protein interactions) can lead to a more can lead to a more coherent reconstruction and reduce the coherent reconstruction and reduce the effects of noiseeffects of noise. Nir Friedman Science 2004
![Page 24: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/24.jpg)
Sources of DataData set Description Size [ORFs] Reference
mRNA expression
YoungGene chip profiles yeast cells with mutations that affect transcription 5455 Holstege et al. (1998)
Church Gene chip profiles of yeast cells under four different conditions 6263 Roth et al. (1998)
SamsonComparing gene chip profiles for yeast cells subjected to alkylating agent 6090 Jelinsky et al. (1998)
SAGE Yeast cells during vegetative growth 3778 Velculescu et al. (1997)
Reference expressionScaling and integrating the mRNA expression set into one data source 6249 -
Protein abundance
2-DE #1Measurement of yeast protein abundance by two-dimensional (2D) gel electrophoresis and mass spectrometry 156 Gygi et al. (1999)
2-DE #2 Similar to 2-DE set #1 71 Futcher et al. (1999)
TransposonLarge-scale fusions of yeast genes with lacZ by transposon insertion 1410
Ross-Macdonald et al. (1999)
Reference abundanceScaling and integrating the 2-DE data sets into one data source 181 -
Annotation
Annotated Localization
Subcellular localizations of yeast proteins 2133 (6280) Drawid et al. (2000)
Transmem-brane segments
Predicted transmembrane and soluble proteins in yeast 2710 (6280) Gerstein (1998)
MIPS functions Functional categories for yeast ORFs 3519 (6194) Mewes et al. (2000)
GOR secondary structure
Predicted secondary structure for yeast ORFs 6280 Gerstein (1998)
![Page 25: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/25.jpg)
Reference mRNA Sets
Young
ChurchSamson
SAGE
![Page 26: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/26.jpg)
Fitting Protein Data
Original Set
![Page 27: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/27.jpg)
mRNA vs Protein
r =0.67
Greenbaum et al Bioinformatics 2001
mRNA expression Reference Set 3 Affy Chip sets and SAGE6249 ORFs
![Page 28: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/28.jpg)
Outliers (2STDEV from the mean)
ORF FUNCTION MIPSYBR118W translation elongation factor eEF1 alpha-A chain 5,30YER065C Isocitrate Lyase 1,2, 30YMR303C Alcohol dehydrogenase II 1, 2, 30YOL086C Alcohol dehydrogenase I 1, 2, 30YJR009C Glyceraldehyde-3-phosphate dehydrogenase 2 1, 2, 30YGR192C Glyceraldehyde-3-phosphate dehydrogenase 3 1, 2, 30YJR104C Copper-zinc superoxide dismutase 11,30YML054C lactate dehydrogenase cytochrome b2 1,2,30YJL052W glyceraldehyde-3-phosphate dehydrogenase 1 1,2,30YKR059W Translation initiation factor 5,30YML008C S-adenosyl-methionine delta-24-sterol-c-methyltransferase 1,30YFL022C Phenylalanine-- tRNA Ligase beta chain 5,30YJL008C Component of chaperonin-containing T-complex 6,30YPL160W leucine--tRNA ligase 5,30YOR361C translation initiation factor eIF3 subunit 3,5,30YCL030C phosphoribosyl-AMP cyclohydrolase 1YNL209W heat shock protein of HSP70 family 5,30
abo
ve t
ren
dli
ne
bel
ow
tre
nd
lin
e
High ProteinMetabolism (1)
Energy(2)
Low ProteinProt. Syn. (5)Prot. Fate (6)
![Page 29: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/29.jpg)
Later larger datasets concurred with these results in that Generally…
1
10
100
1000
10000
100000
1000000
10000000
0.1 1 10 100 1000
mRNA
pro
tein
Alcohol dehydrogenase is also a stress induced protein in many organisms (Matton et al. 1990; An et al. 1991; Millar et al. 1994), Faster Ramp Up?
AA metabolism & Energy are 2X as likely to have high protein vs mRNA than the general population
Protein synthesis (~35% of all protein synthesis genes) and Protein fate (folding, modification, destination) are more likely to have low protein vs mRNA than the general population
![Page 30: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/30.jpg)
Non-Outliers Generally…Tight Regulation by the cell
Only 3% of transcription associated genes (n = 441) have significantly uncorrelated mRNA and protein levels (2STDEV from trendline)
Transcription Assoc. genes are 25% of the essential genes in yeast.
Essential Genes as a group have higher correlations than the general yeast population
7% of Cell Cycle associated genes (n = 432) have significant non-correlation
![Page 31: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/31.jpg)
Quick Summary
• Why correlate mRNA and protein levels?• Merged Disparate Data Sets
– Distinct but complimentary
• Global Correlations• Outliers are interesting:
– Metabolism & Energy Relatively high protein levels
– Protein Synthesis & Protein Fate low protein levels
![Page 32: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/32.jpg)
Data Set Size
~6,000 ORFs
~6,000 ORFs5 Affymetrix GeneChips+ SAGE data
~170 ORFs2 DE-gel datasets
![Page 33: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/33.jpg)
Enrichments
(F,[v,S]) -(F,[w,G])(F,[w,G])(Feature, [v,S], [w,G]) =
V & W are weights (expression level) of Sets S & G
![Page 34: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/34.jpg)
Visual Formalism
~170 ORFs ~6,000 ORFs
![Page 35: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/35.jpg)
Depletion of Random Coil Secondary Structure STABILITY
Concurrence with data from Perczel et al Chemistry 2003Regarding stability of specific secondary structures
![Page 36: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/36.jpg)
Alanine’s, Glycines, Valines result in more compact structures More compact = more stable (i.e. thermophilic enzymes tend to be very compact)
Enrichment of Amino Acids STABILITY
![Page 37: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/37.jpg)
Enrichment of Amino Acids
Simple story: translatome is enriched in same way as
transcriptome
![Page 38: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/38.jpg)
Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost
yeast cell favors the expression of shorter ORFs over longer ones (as opposed to long lightweight ORFs – see MW of aa)
This selection is happening, for the most part at the transcriptome level--------------------------------------------------------------------------------------------------
Neg Correlation between ORF length and mRNA expression Jansen & Gerstein 2000 (And to a lesser degree with Protein Abundance)
Effect of transcription
![Page 39: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/39.jpg)
Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost
CONCURS with experimental results from Akashi, Genetics 2003See also: Akashi,Genetics 1996 & Moriyama and Powell, NAR 1998
hypothesize that this trend exists in S. cerevisiae, D. melanogaster and E. coli. (although probably not in C. elegans)
Effect of transcription
![Page 40: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/40.jpg)
Enrichment of Functional Categories
1
10
100
1000
10000
100000
1000000
10000000
0.1 1 10 100 1000
mRNA
pro
tein
![Page 41: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/41.jpg)
Depletion Functional Categories
Transcription & Cell GrowthMolecular switches
Require only minimal expression
![Page 42: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/42.jpg)
Enrichment of localization - BIAS?
(Drawid & Gerstein. 2000),
![Page 43: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/43.jpg)
Review
Formalism
Different gene sets b/c of limited data
Enrichments
concur with experimental results
![Page 44: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/44.jpg)
Fitting Protein Data
Newer SetMudpit fit first into mRNA space
then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set
Aebersold Futcher Reference Yates Gygi mRNA
Aebersold 125 29 113 102 116 125
Futcher 73 61 56 64 69
Reference 150 143 128 150
Yates 1436 785 1346
Gygi 1504 1480
mRNA 6250
![Page 45: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/45.jpg)
Fitting Protein Data
Newer SetMudpit fit first into mRNA space
then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set
Aebersold Futcher Reference Yates Gygi mRNA
Aebersold 125 29 113 102 116 125
Futcher 73 61 56 64 69
Reference 150 143 128 150
Yates 1436 785 1346
Gygi 1504 1480
mRNA 6250
![Page 46: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/46.jpg)
Fitting Protein Data
Newer SetMudpit fit first into mRNA space
then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set
Aebersold Futcher Reference Yates Gygi mRNA
Aebersold 125 29 113 102 116 125
Futcher 73 61 56 64 69
Reference 150 143 128 150
Yates 1436 785 1346
Gygi 1504 1480
mRNA 6250
![Page 47: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/47.jpg)
Fitting Protein Data
Newer SetMudpit fit first into mRNA space
then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set
Aebersold Futcher Reference Yates Gygi mRNA
Aebersold 125 29 113 102 116 125
Futcher 73 61 56 64 69
Reference 150 143 128 150
Yates 1436 785 1346
Gygi 1504 1480
mRNA 6250
![Page 48: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/48.jpg)
Fitting Protein Data
Newer SetMudpit fit first into mRNA space
then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set
Aebersold Futcher Reference Yates Gygi mRNA
Aebersold 125 29 113 102 116 125
Futcher 73 61 56 64 69
Reference 150 143 128 150
Yates 1436 785 1346
Gygi 1504 1480
mRNA 6250
![Page 49: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/49.jpg)
Global Correlation
0.1
1
10
100
1000
0.1 1 10 100 1000
mRNA Expression
Pro
tein
Ab
un
dan
ce
MudPit (1)MudPit (2)2DE (1)2DE (2)R = 0.66
mRNA Set 6249 ORFs Protein Set # 2 2 2DE sets & 2 Mudpit ~2000 ORFs
![Page 50: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/50.jpg)
Functional Categories
0.1
1
10
100
1000
0.1 1 10 100
mRNA Expression
Pro
tein
Ab
un
dan
ce
Cell Cycle (R=0.71)
Reference Data (R=0.66)
Cell Rescue (R=0.45)
Co-regulated proteins
High: ion transport , INTERACTION WITH THE CELLULAR ENVIRONMENT, CELL FATE LOW: METABOLISM ,FATE. CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION MECHANISM
![Page 51: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/51.jpg)
Subcellular Localization
0.1
1
10
100
0.1 1 10 100mRNA Expression
Pro
tein
Ab
un
da
nc
e
Nucleolus (R=0.8)
Cell Periphery (R=0.74)
Reference Data (R=0.66)
Mitochondria (R=0.42)
Subcellular LocalizationMudpit does not have the 2DE biases
Lack of correlation in mitochondria Concurs
with experimental results from
Ohlmeier S et al.JBC 2004
![Page 52: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/52.jpg)
Budr =0.76
Golgir = 0.28
Extracellularr = 0.33
Nucleusr = 0.49
Cytoplasmr = 0.50
Mitochondriar = 0.50
Cell Wallr =0.52
Endosomer = 0.87
ER r = 0.61
Membraner = 0.73
P M
r global = 0.46
Expression as a function of localization is well correlated with protein levels (latest data)
![Page 53: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/53.jpg)
Why would we not find strong correlations?
Post translational modifications
Protein degradation
Error and Bias
![Page 54: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/54.jpg)
Top
Top
Top
Bottom
Bottom
Bottom
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Co
rrel
atio
n
Occupancy CAI Coefficient of Variation
Ribosomal OccupancyArava et al. (2003) Proc. Natl. Acad. Sci. USA
Ribosomal Occupancy
Top Frac. 0.78Bot. Frac. 0.30
Our results concurred with experimental findings by Brown and Herschlag’s groups:
Moreover:mRNAs not associated with any polysomes have even less of a correlation r = 0.2 v. strong translational control
![Page 55: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/55.jpg)
Variability of mRNA expression
Top
Bottom
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Co
rre
lati
on
Coefficient of Variation
mRNA Expression Variability
Top Frac. 0.89Bot. Frac. 0.20
0
5
1 0
1 5
2 0
2 5
3 0
3 5
4 0
timemR
NA
ex
pres
sion
![Page 56: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/56.jpg)
Variability of mRNA expression
Top
Bottom
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Co
rre
lati
on
Coefficient of Variation
mRNA Expression Variability
Top Frac. 0.89Bot. Frac. 0.20
0
5
1 0
1 5
2 0
2 5
3 0
3 5
4 0
timemR
NA
ex
pres
sion
![Page 57: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/57.jpg)
Codon Adaptation Index
Top
Bottom
0
0.1
0.2
0.3
0.4
0.5
0.6
Co
rrel
atio
n
CAI
Codon Usage
Top Frac. 0.48Bot. Frac. 0.02
Concurs with experimental data: CAI does not Predict mRNA and protein the same way shown to be the result of different levels ofdegredation
![Page 58: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/58.jpg)
Another summary
Newer, larger data setLooking at Broad Catagories
I Post translational modifications?where we expect PT control --> low r. Where we don’t expect --> high r
Occupancy Variability
II Protein Degradation? CAI
III Experimental Error? next section
![Page 59: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/59.jpg)
Expression and interactions
Types of protein-protein interactions– Protein complexes
• For example: proteasome, ribosome
– Aggregated interactions• Yeast two-hybrid (Y2H)• Genetic/physical interactions from MIPS
![Page 60: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/60.jpg)
Relationship of P-P-interactions to abs. expression level
EE
EED
i
ji
ij
similar protein results
![Page 61: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/61.jpg)
Protein-Protein Interactions & Expression
Correlations
between selected expression timecourses
(all pairs, control)
(strong interactions in perm- anent complexes, clearly diff.)
Cell Cycle CDC28 expt. (Davis) Sets of interactions
(from MIPS)
(Uetz et al.)
Pairwise interactions
![Page 62: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/62.jpg)
Protein-Protein Interactions & Expression Correlations
Sets of interactions
between selected expression timecourses
(all pairs, control)
(from MIPS)
(strong interactions in perm- anent complexes, clearly diff.)
(Uetz et al.)
Cell Cycle CDC28 expt. (Davis)
Pairwise interactions
![Page 63: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/63.jpg)
Permanent vs. Transient Complexes
-0.2
0
0.2
0.4
0.6
0.8
1
-0.2 0 0.2 0.4 0.6 0.8 1 1.2CC
Ro
sett
a
transient
Permanent
.
L Ribosome
S Ribosome
SAGA
![Page 64: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/64.jpg)
Representing Expression Correlations within a Large Complex in a Matrix
MCM3MCM6CDC47MCM2CDC46CDC54
DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1
MC
M3
MC
M6
CD
C4
7M
CM
2C
DC
46
CD
C5
4
DP
B3
CD
C4
5D
PB
2C
DC
2C
DC
7P
OL
2H
YS
2P
OL
32
DB
F4
OR
C2
OR
C6
OR
C5
OR
C4
OR
C3
OR
C1
correlation
![Page 65: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/65.jpg)
Permanent? Transient?
correlation
![Page 66: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/66.jpg)
L7/L12
correlation
Cell degrades all excess riboosmal proteins, except L7 & L12
![Page 67: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/67.jpg)
Expression Correlations Segment Large Replication Complex into Component Parts
MCM3MCM6CDC47MCM2CDC46CDC54
DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1
MCMsprots.
ORC
Polym.&
Temporally transient
![Page 68: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/68.jpg)
No distinction visible between components
indicative of the possibility that the two components are really one?
Division is an artifact of their discovery—M Hochstrasser
ProteasomeOverall .43 20S .5019S .51
Proteasome
![Page 69: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/69.jpg)
%ORFs in complexes with significant correlation
Complex (> 2 ORFS, P < 0.001) n alpha Cdc15 Cdc28 Rosetta
Alpha, al-treh. anchor (50) 4 75% 75%
Cacinerum B (100) 3 67% 67%
Chaperone containing T-complex TRiC (130) 8 50% 25%
Pho85p (133.20) 6 33%
Glycine decarboxylase (200) 3 67%
ATPase (210) 4 100% 50%
TRAPP (260.60) 10 40%
Vps4p ATPase (260.70) 3 67%
Nucleosome protein (320). 8 100% 87% 37% 75%
Cytochrome bc1 complex (420.30) 9 44% 78% 78%
Cytochrome c oxidase (420.40) 8 50% 38% 88% 50%
F0/F1 ATP synthase (complex V)(420.5) 15 60%
Ribonucleoside reductase (430) 4 50%
Nuclear processing (440.10.10) 5 40%
RNA polymerase I (510.10) 8 38% 38% 50%
RNA polymerase II (510.40.10) 9 44%
Tornow & Mewes NAR 2003
![Page 70: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/70.jpg)
Average Expression of all subnunits in a complex
y = 3028.4x1.0635
R2 = 0.6076
1
10
100
1000
10000
100000
1000000
10000000
0.1 1 10 100
mRNA expression (x103 )
pro
tein
ab
un
dan
ce
![Page 71: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/71.jpg)
PP INT Summary
Complexes broad catagories minimize noise
– Permanent complexes show strong co-expression Posttranscriptional regulation functions at a whole complex
level (Washburn et al PNAS 2003)
– Transient complexes have weaker co-expression
Aggregated BINARY interactions (Y2H, physical, genetic)Weak co-expression similar to transient complexes --noisy data?
ERROR ? minimized in larger groups
![Page 72: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/72.jpg)
Global Summary
mRNA expression is related to protein abundance
Broad categories minimize noise that prevents us from seeing this correlation
Integrating various genomic data is integral to an analysis
Biologically relevant results can be seen when looking at mRNA and protein populations
![Page 73: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/73.jpg)
Future Research
Further indepth analysis into protein degredation
Integrate new Tap Tagging data into protein abundance ref set
More intensive modeling of the relationship between mRNA and protein
![Page 74: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/74.jpg)
Relationship between mRNA and Protein levels
dPi
dt= ks;i * mRNAi - kd;i Pi
where ks,i and kd,i are the protein synthesis and degradationrate constants, respectively, and is the growth rate
At steady state: Pi =ks;i * mRNAi
kdi
![Page 75: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/75.jpg)
N end rule PEST?
N End Rule in Yeast
1 10 100 1000 10000
Arg
Lys
Phe
Leu
Trp
Asn
His
Asp
Gln
Tyr
Ile
Glu
Cys
Ala
Ser
Thr
Gly
Val
Pro
Met
AA
In Vivo Hallf Life (Min)
Fast DecaySlow Decay
![Page 76: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/76.jpg)
Results of protein degredation
Significantly higher correlation for fast decaying proteins
Not for slow decayhigh decay rate is indicative of greater
cellular control over level e.g. proteins with half lives of days – cell can’t tightly control
Results are same for mRNA degredation --half lives have been quantified
![Page 77: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/77.jpg)
Acknowledgments
Gerstein Lab
This workRonald Jansen (MSKCC)Yuval Kluger (NYU)
Other ProjectsHaiyuan YuHedi HegyiJimmy LinRajdeep DasJiang QianNick Luscombe
Entire Gerstein Lab
Weissman LabZheng Lian
Keck (HHMI Biopolymer Laboratory and W. M. Keck Foundation Biotechnology Resource Laboratory)
Christopher ColangeloKen Williams
Thesis Committee
Mark GersteinSherman WeissmanKevin White
Genetics Department
SABRINA
![Page 78: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004](https://reader036.vdocument.in/reader036/viewer/2022062716/56649dc75503460f94abbf80/html5/thumbnails/78.jpg)
Liana