proteomics carmen l. de hoog and matthias mannfaculty.fiu.edu/~noriegaf/refereces proteomics web...

32
Annu. Rev. Genomics Hum. Genet. 2004. 5:267–93 doi: 10.1146/annurev.genom.4.070802.110305 Copyright c 2004 by Annual Reviews. All rights reserved First published online as a Review in Advance on May 21, 2004 PROTEOMICS Carmen L. de Hoog and Matthias Mann Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark; email: [email protected], [email protected] Key Words protein-protein interactions, mass spectrometry, functional genomics, protein arrays, organellar proteomics, systems biology Abstract The genome sequences of important model systems are available and the focus is now shifting to large-scale experiments enabled by this data. Following in the footsteps of genomics, we have functional genomics, proteomics, and even metabolomics, roughly paralleling the biological hierarchy of the transcription, trans- lation, and production of small molecules. Proteomics is initially concerned with deter- mining the structure, expression, localization, biochemical activity, interactions, and cellular roles of as many proteins as possible. There has been great progress owing to novel instrumentation, experimental strategies, and bioinformatics methods. The area of protein-protein interactions has been especially fruitful. First pass interaction maps of some model organisms exist, and the proteins in many important organelles are about to be determined. Researchers are also beginning to integrate large-scale data sets from various “omics” disciplines in targeted investigations of specific biomedical areas and in pursuit of a general framework for systems biology. INTRODUCTION There have been a number of revolutions in molecular biology in the past few decades—most recently the completion of the immense effort to sequence the hu- man genome (55, 106). Apart from the information inherent in the sequence itself, the genome provides researchers with an inventory of genes with which to pursue comprehensive, global inquiries into the working of the cell and organism. On one hand, genomics and the other “omics” disciplines provide an infrastructure to “turbocharge” traditional biological research into the functions of single genes and the mechanisms of specific cellular processes. For example, the genomic sequence of a gene of interest, a full-length clone, the cellular location of the protein, tissue expression, and several possible interaction partners may already be available at the outset of a project. On the other hand, the evolving large-scale methods of studying biology will also transcend traditional biology by providing systems-level infor- mation and patterns that no amount of directed, hypothesis-driven research could produce. Together, increasingly effective single-gene, mechanism-based biology and system-wide biology will define the life sciences, revolutionize medicine, and eventually contribute to a better understanding of ourselves. 1527-8204/04/0922-0267$14.00 267

Upload: others

Post on 08-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH10.1146/annurev.genom.4.070802.110305

Annu. Rev. Genomics Hum. Genet. 2004. 5:267–93doi: 10.1146/annurev.genom.4.070802.110305

Copyright c© 2004 by Annual Reviews. All rights reservedFirst published online as a Review in Advance on May 21, 2004

PROTEOMICS

Carmen L. de Hoog and Matthias MannDepartment of Biochemistry and Molecular Biology, University of Southern Denmark,DK-5230 Odense M, Denmark; email: [email protected], [email protected]

Key Words protein-protein interactions, mass spectrometry, functional genomics,protein arrays, organellar proteomics, systems biology

■ Abstract The genome sequences of important model systems are available andthe focus is now shifting to large-scale experiments enabled by this data. Followingin the footsteps of genomics, we have functional genomics, proteomics, and evenmetabolomics, roughly paralleling the biological hierarchy of the transcription, trans-lation, and production of small molecules. Proteomics is initially concerned with deter-mining the structure, expression, localization, biochemical activity, interactions, andcellular roles of as many proteins as possible. There has been great progress owing tonovel instrumentation, experimental strategies, and bioinformatics methods. The areaof protein-protein interactions has been especially fruitful. First pass interaction mapsof some model organisms exist, and the proteins in many important organelles areabout to be determined. Researchers are also beginning to integrate large-scale datasets from various “omics” disciplines in targeted investigations of specific biomedicalareas and in pursuit of a general framework for systems biology.

INTRODUCTION

There have been a number of revolutions in molecular biology in the past fewdecades—most recently the completion of the immense effort to sequence the hu-man genome (55, 106). Apart from the information inherent in the sequence itself,the genome provides researchers with an inventory of genes with which to pursuecomprehensive, global inquiries into the working of the cell and organism. Onone hand, genomics and the other “omics” disciplines provide an infrastructure to“turbocharge” traditional biological research into the functions of single genes andthe mechanisms of specific cellular processes. For example, the genomic sequenceof a gene of interest, a full-length clone, the cellular location of the protein, tissueexpression, and several possible interaction partners may already be available at theoutset of a project. On the other hand, the evolving large-scale methods of studyingbiology will also transcend traditional biology by providing systems-level infor-mation and patterns that no amount of directed, hypothesis-driven research couldproduce. Together, increasingly effective single-gene, mechanism-based biologyand system-wide biology will define the life sciences, revolutionize medicine, andeventually contribute to a better understanding of ourselves.

1527-8204/04/0922-0267$14.00 267

Page 2: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

268 DE HOOG � MANN

Genomics now concentrates on sequencing additional organisms for compara-tive purposes and on defining genetic differences between individuals to constructhaplotype maps (29). Even for the main model systems, the task of genomics is notyet finished. The genomes are not fully annotated and the gene inventory is stillhotly disputed. In the case of the C. elegans genome, which is of exemplary qual-ity, more than half of the gene structures need to be corrected (87), and reportedhuman gene numbers and identities vary even more.

In the last few years, it has become widely recognized that the genome onlyrepresents the first layer of complexity. Biological function is not carried out bythe static genome but mainly by the dynamic population of proteins determined byan interplay of gene and protein regulation with extracellular influences. Even theparts list of mature proteins, including splice variants and post-translational mod-ifications, cannot be predicted from the genome sequence. There are numerousinstances of differential splicing and many post-translational protein modifica-tions (e.g., phosphorylation, glycosylation, ubiquitination, and methylation) thatcan govern the behavior of proteins more than differing rates of synthesis. AlthoughmRNA profiling through microarrays offers immense potential for increasing theunderstanding of molecular changes that occur during biological processes includ-ing disease progression, it does not capture mechanisms of regulation involvingchanges in cellular localization, sequestration by interaction partners, proteolysis,and recycling.

For these reasons, there is increasing interest in the field of proteomics, or thelarge-scale study of proteins as a complement to genomics and functional ge-nomics. The proteome was originally defined as the complete protein complementexpressed by a genome (108). However, this definition does not take into accountthat the proteome is a highly dynamic entity that will change based on cellular stateand the extracellular milieu. Therefore, the definition of a proteome should specifythat it is the protein complement of a given cell at a specified time, including theset of all protein isoforms and protein modifications. Although proteomics is alarge-scale endeavor, much of its attraction lies in its ability to focus its tools onselected populations of proteins in specific circumstances, contributing directly tofunctional and mechanistic questions.

In this review we hope to place proteomics into context, broadly outline itsmain themes and achievements with examples, and project its future impact.

GENOMICS, FUNCTIONAL GENOMICS,AND PROTEOMICS

In recent years, a bewildering and sometimes silly terminology has sprung uparound large-scale biological experiments. Figure 1 gives a simplified overviewof the main categories. The left-hand part of the figure schematically depicts geneexpression starting from the gene and proceeding through transcript to the ma-ture protein. We also included a fourth layer to symbolize small molecules suchas metabolites or small signaling molecules because their uptake, synthesis, and

Page 3: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 269

regulation are largely under the control of the proteome. The triangle depicts thecorresponding areas of large-scale biological approaches. Genomics aims to deter-mine the linear chromosomal sequence of model organisms, as well as sequencedifferences between individuals. Annotating the genome, including defining cod-ing and regulatory sequences, is also part of genomics. The next level down dealswith transcripts [messenger RNA (mRNA)]. Whereas the defining technology ingenome sequencing is the automated DNA sequencer, in functional genomics it ismicroarray hybridization, which determines the level of mRNA in cells or tissues orthe relative level between two states. There is not necessarily a one-to-one relation-ship between the genome sequence and the transcript; cells use alternative splicingto produce different transcripts from one gene and a whole range of mechanismsto control the production, transcriptional status, and degradation of mRNA. In ad-dition to global measurement of transcript levels in myriad conditions, functionalgenomics encompasses various strategies where the message is eliminated and theresulting cellular effects are observed. This part of functional genomics—whencarried out on whole organisms—is also called phenomics because the phenotypeof an organism is observed (79). There are many other functional genomics strate-gies; they are all performed at the level of oligonucleotides rather than at the level ofproteins.

Proteins, the main carriers of biological activity, embody the next level of com-plexity. Protein function depends on the precise amino acid sequence, the modifica-tions (especially regulatory ones such as phosphorylation), the three-dimensional(3D) structure, the protein concentration, the association with other proteins, andthe extracellular environment. Accordingly, proteomics seeks to determine proteinstructure, modifications, localization, and protein-protein interactions in additionto protein expression levels. Mass spectrometry (MS) is currently the most versa-tile technology to directly measure endogenous proteins (4). Non-mass spectro-metric technologies in proteomics involve array-based systems (for example, theyeast two-hybrid assay for protein-protein interactions) or structural and imagingtools.

Many proteins are enzymes or regulate the function of enzymes that ultimatelydetermine the level of small molecules. A comprehensive view of the cell re-quires that these small molecules are also measured and modeled. The correspond-ing large-scale discipline, which is most advanced in engineering and metaboliteflux analysis, is sometimes called “metabolomics.” Experimental approaches inmetabolomics often involve MS, as they do in proteomics.

The above definitions are not cast in stone. On one hand, there is no generalagreement on the nomenclature and the delineation of these disciplines. On theother hand, experimental approaches often overlap these categories. In any case,one can hardly overemphasize that none of the above approaches has a privi-leged, much less exclusive, position in the quest to elucidate biological function.Rather, each discipline will contribute. The dream of a quantitative understandingof cellular systems—systems biology—will be achieved by a combination of theseapproaches with traditional mechanism-based biology.

Page 4: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

270 DE HOOG � MANN

TECHNOLOGIES OF PROTEOMICS

The types of large-scale experiments performed in the field of proteomics requirespecialized tools, and each tool may be suited to a particular type of experimentaldesign or question asked.

In all large scale-experiments, automation is necessary. Genomics has lead theway in miniaturization and robotization and this theme is followed in all areasof functional genomics and proteomics. Technologically, some proteomics ex-periments involve little more than well-known laboratory procedures streamlinedto the scale of whole proteomes. Most proteomic strategies depend on a supplyof reagents, such as full-length cDNA clones in convenient expression vectors,libraries of recombinant proteins, and corresponding, specific antibodies. Mak-ing these reagents available is a goal of proteomics and of proteomics associa-tions such as the Human Proteome Organization (HUPO, http://www.hupo.org).Furthermore, proteomics is becoming more dependent on bioinformatics for thestraightforward but not trivial task of storing proteomic data in databases as wellas for pattern extraction, interpretation, and statistical evaluation of results andhypotheses. Bioinformatics also generates in silico data sets that can be testedby proteomics. It is not easy to compare proteomics results between laboratoriesbecause the technologies are still evolving rapidly and will not be standardized forquite some time. Efforts are nevertheless underway to define common reportingcategories and forms (78, 102). Specialized proteomics databases are also beingset up, for example, for protein-protein interactions (10, 112, 113). Beyond thesedatabases, we will need multidimensional knowledge environments for storing,visualizing, and interacting with biological knowledge and proteomic data. Initialefforts in this direction are already underway (47, 61, 94).

Proteomics is specially suited to drug discovery because most drugs inhibit thefunction of specific proteins. Therefore, several successful strategies were devel-oped that marry small molecule screens with a proteomic read out. One of them,activity-based profiling, seeks to label groups of proteins with common enzymaticfunction by covalent tags, which are used to retrieve and quantify these proteins(97). In other methods, proteins are eluted from immobilized drugs, making themavailable for proteomic study (32, 34, 75). These techniques basically use the pro-teomic analysis capabilities developed over the last few years to develop assays,resulting in populations of endogenous proteins interacting with drugs, rather thanthe current single drug–single target paradigm.

Three main areas of proteomics technologies can be distinguished. One is MS,which is on a trajectory to develop increasingly sensitive and comprehensive analy-ses of endogenous protein mixtures. The second is array-based proteomics, whichowes much to cDNA microarray and oligonucleotide chips. However, in pro-teomics, antibodies, recombinant proteins, or even cell populations are arrayedand thereby made addressable and identifiable solely by their grid position. Thethird area is structure and imaging, where the 3D shape, localization, and dynamicsof individual proteins or protein complexes are investigated on a large scale.

Page 5: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 271

Table 1 lists the types of tools in use today; below we only outline developmentsin the three main technological areas.

Mass Spectrometry

MS for protein identification relies on the digestion of protein samples into peptidesby a sequence-specific protease such as trypsin. Peptides are much more amenableto MS analysis because proteins cannot easily be eluted from gels and because themolecular weight of whole proteins is not usually sufficient information for identi-fication because proteins are heterogeneous and thereby usually possess no singlemolecular weight. After the proteins are digested, the peptides are often deliveredto a mass spectrometer for analysis via chromatographic separation coupled onlineto electrospray ionization (LCMS for liquid chromatography mass spectrometry).Matrix-assisted laser desorption/ionization (MALDI) is an alternative ionizationmethod, but it is less frequently applied to the proteomic analysis of complexprotein mixtures. We recently reviewed MS in more detail (4, 100).

Mass spectrometers consist of a few simple modules: an ionization mechanism;a section to separate, select, and fragment peptides; and a detector. The machinecan be instructed to first mass measure all the peptides eluting at any given timefrom the chromatography column, and then select a number of peptide ions in turnfor fragmentation. This happens by allowing only ions of a particular mass throughto a “collision cell,” where the peptide ions’ kinetic energy is increased and theycollide with inert gas molecules with sufficient energy to break peptide bonds.The resulting spectrum is called a tandem, or “MS/MS,” spectrum and generallycontains several adjacent fragments that spell out a partial sequence of the peptidein question. These fragments are then compared to fragments calculated from allpeptides in protein databases to arrive at protein identifications. The proteomicdata set produced by MS therefore consists of thousands of peptide mass spectra,each followed by several peptide fragmentation spectra. The mass spectra containinformation on the mass and intensity of the peptide peaks, whereas the tandemmass spectra are solely used to identify the peptides.

Several algorithms were developed to match tandem mass spectra to peptidesequences (100). These algorithms calculate a score for how well the spectrumagrees with that expected for the retrieved peptide sequence. Peptides are acceptedas genuine hits if they fulfill criteria such as a score above a chosen significancecutoff, correct enzymatic cleavage, and absence of other peptides that match withsimilar scores. Regardless of algorithm, peptides with a significant identificationscore are combined into a long list of proteins present in the mixture. However,it is now clear that large-scale application of peptide identification can lead tohigh levels of false positive hits if proper care is not taken (14, 48, 80). This isparticularly true of proteins identified by a single peptide and when researchershave allowed peptides that do not match tryptic cleavage rules to count as matches(75a). Rather than define a single, strict set of parameters to guarantee correct pep-tide identification, papers published last year show that probabilistic strategies are

Page 6: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18A

ug2004

4:48A

RA

R223-G

G05-10.tex

AR

223-GG

05-10.sgmL

aTeX2e(2002/01/18)

P1:IKH

272D

EH

OO

G�

MA

NN

TABLE 1 The tools of proteomics

Method Description Applications Practicality Refs.

Mass Spectrometry Digest proteins and Protein identification, Need access to (4)fragment peptides sequence, post-translational instrument/expertiseto identify protein modifications, quantitation to run samples

Yeast 2-Hybrid Genetic approach based Protein-protein interaction; No specialized (30, 58, 105)on transcription factor protein-DNA interaction instruments requiredreconstitution that but high falseidentifies protein positive rateinteractions

Chips Synthesize proteins, Protein interactions with Need access to facility (31, 83, 93, 120)peptides, antigens, or proteins, lipids, and that can provideantibodies into an array small molecules; drug chips and analysisformat and spot onto slides discovery; post-translational capabilities

modifications; clinicaldiagnostics

2D gels Separation of protein Post-translational Reproducibility issues,samples on the basis modification studies, dynamic rangeof charge and differential protein expression problemsmolecular weight

Bio-informatics In silico proteomics Mining databases and/or Relies on experimentallarge-scale data sets, data for predictionpredicting protein interactions and confirmation

Structure-based X-ray crystallography, Structure of macromolecular Need access to (88)NMR spectroscopy, assemblies, determination equipment usedelectron tomography, 2D of protein subunit for structureelectron microscopy contacts & proximity determination

Page 7: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18A

ug2004

4:48A

RA

R223-G

G05-10.tex

AR

223-GG

05-10.sgmL

aTeX2e(2002/01/18)

P1:IKH

PRO

TE

OM

ICS

273

Small molecule Screen a library Protein-small molecule (53, 109)screens of small molecules, interactions, drug

looking for compounds discovery, enzymethat affect a specific specificity studiescellular function

Activity-based profiling Active site-directed Detection, isolation, (2, 96, 97)chemical modification, and identification ofdirected toward a specific active enzymes fromclass of biological complex proteinactivity (e.g., serine mixtureshydrolases)

Imaging Fluorescence energy Protein-protein Clones required, (42, 83, 121)resonance transfer interaction in relatively simplebetween fluorophores live cells;that are in close protein localizationproximity (<100 A);transfected cellarrays; GFP fluorescence

Reagents Antibodies, cDNAs, Protein-protein interaction, Significant outlayrecombinant protein localization, affinity of resources tosets, siRNA constructs pulldowns, protein validation construct clones,

purify proteins,generate antibodies

Page 8: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

274 DE HOOG � MANN

better suited to optimize the trade-off between false positive and false negative hits(8, 70). We strongly recommend manually inspecting the raw mass spectrometricdata if there is any ambiguity in peptide identification.

Mass spectrometers and associated chromatographic equipment continue toimprove and mature. Dozens or even hundreds of proteins in a given mixture cannow be identified with relatively affordable instrumentation. However, there arestill few laboratories able to perform proteome-scale and high-sensitivity exper-iments. Although running the mass spectrometers is less of a challenge than itused to be, the combined know-how from protein preparation to analysis of resultsstill requires significant expertise in a single group. Thus, dedicated proteomiccenters might have economies of scale and have been proposed on a nationalscale (5).

An exciting combination of a linear ion trap with a high-performance analyzer,the Fourier Transform Ion Cyclotron Resonance Mass Spectrometer (FT-ICR orFT-MS, see Reference 65 for introduction) was introduced in 2003 (41). The linearion trap is an improved version of the common ion trap used in many proteomicexperiments; the difference is that the ions are stored in a cylindrical—rather thanpoint-like—field with much higher capacity. In an FT-ICR detector, the peptidesare confined within a strong magnetic field and orbit at a precise frequency that isinversely proportional to their mass-to-charge ratio. This instrument combines thespeed and robustness of the linear ion trap with the highly increased mass accuracyand sensitivity of a FT-ICR detector, which enables analysis of extremely complexmixtures containing thousands of peptides. The combined instrument can sequencepeptides approximately ten times faster than previous machines and offers substan-tially higher resolution and mass accuracy. Much of the uncertainty in the peptideidentification is directly related to the mass accuracy of the instrument—the greaterthe accuracy, the fewer peptides in a given genome are possible matches. Due to itsgreatly improved mass accuracy and the quality of spectra it generates, the FT-MSincreases the confidence of protein identifications by more than a factor of 100(75a).

High-accuracy MS data sets can also directly assist in gene prediction. Considera sequenced peptide present in a region of the genome where no gene is predicted;the existence of the peptide establishes that a protein product is produced from thatstretch of DNA, and this can help to annotate the genome. During the proteomicanalysis of the malaria parasite, Lasonder et al. (56) noticed that many peptideswith clear, easily interpretable fragmentation spectra were not identified in the pre-dicted malaria proteome. Searching the genomic data with these “orphan” peptidesresulted in the refinement of some gene annotations. Similarly, in principle, MScan assist in identifying alternative splicing events by identifying N-terminal pep-tides from proteins or by identifying peptides that are not possible unless particularexons are spliced out. However, this requires substantial coverage of the proteinsequence with peptides, which is not yet generally achieved in proteomic experi-ments. Note that although a sequenced genome is necessary for mass spectrometricproteomic studies, the complete, finished sequence of an organism’s genome is not

Page 9: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 275

required for mass spectrometric protein identification. Even the rough draft of thehuman genome was sufficient to confidently identify proteins (54).

Until recently, global protein levels and differences in protein compositionbetween cell types were investigated by two-dimensional (2D) gel electrophoresis,a technique that used to be synonymous with proteomics. However, it is now clearthat the hundreds or thousands of spots displayed on 2D gels only correspond atmost to a few hundred genes, that hydrophobic proteins are difficult to analyze,and that the dynamic range of the technique is severely limited. Furthermore,MS can generate an immense amount of data on very small sample sizes in anautomated manner, something that remains a problem with 2D gel electrophoresis.However, for MS to replace 2D gels, quantitative techniques are needed to replacethe semiquantitation previously provided by gel staining, as described below.

QUANTITATION Although MS is an exquisitely sensitive method for identifyingproteins in proteomic studies, it is not a quantitative technique, per se. The intensityof a peptide peak depends linearly on the concentration of the peptide; however,different peptides have different propensities for ionization. Therefore, two pep-tides present in equimolar amounts may show substantially different intensitiesin the mass spectra. Thus, most quantitative techniques rely on modifying one ofthe samples with a stable (nonradioactive) isotope, which changes the molecularmass but not the mass spectrometric behavior. Quantitative differences are thendetermined directly as the difference in peak area between the two peptides in themixed sample. Broadly speaking, there are two approaches for labeling peptideswith stable isotopes: chemically incorporating the mass tag, usually by reactingwith cysteines after purifying the proteins; or, metabolic incorporation, where la-beling of the proteins is achieved by growing cells in a medium enriched in stableisotope-containing precursors (59).

In pioneering work, Gygi et al. (36) utilized a cysteine-reactive chemical reagentcontaining a biotin tag and a linker region bearing either zero or eight deuteriumatoms; this was named the isotope-coded affinity tag (ICAT) reagent. The reagentsare relatively large and, because they remain on the peptide during analysis, frag-ments resulting from the reagent complicate interpretation of the MS/MS spec-tra. Therefore, there have been several modifications to ICAT reagents: 12C/13Creagents have no LC elution shift, unlike the deuterium-labeled ones (37); acid-labile reagents have nine 13C atoms and a chemically cleavable linker to remove thebiotin moiety (57); solid-phase ICAT reagents facilitate effective peptide recovery(115).

ICAT is a powerful technique; however, the reagent is expensive, the techniquerequires relatively large amounts of starting material, and it is not easily compat-ible with cell fractionation and column purification steps. Furthermore, only thecysteine-containing peptides and proteins are quantitated. Thus, researchers havedeveloped many additional strategies for quantitation (59). Recent reports fromour laboratory describe a biological method for incorporating mass tags calledSILAC, for stable isotope labeling by amino acids in cell culture (76, 77). In this

Page 10: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

276 DE HOOG � MANN

method, tissue culture cells are grown in a medium that is lacking a particularessential amino acid—this amino acid is then supplied to the cells in one of twoforms, one containing natural-abundance isotopes and the other containing stableisotopes. After five to seven population doublings, all proteins within the cells arecompletely labeled with the amino acid provided. Any essential amino acid canbe used with this technique—we used 13C-labeled arginine, 2H-labeled leucine,and 13C-labeled lysine. One version of the proteome becomes completely encodedby the labeled amino acid and therefore distinguishable from another proteome.Metabolic labeling is usually limited to microorganisms and cell culture, but wholemetazoans can also be labeled (51). Figure 2 describes an application of this en-coding to elucidate the components of lipid rafts, a cellular structure that cannoteasily be purified.

Proteomics in an Array Format

The yeast two-hybrid screen was first used in 1989 as a tool to elucidate protein-protein interactions (22). It is now a mature technology and has been utilizedextensively by many labs in a low-throughput manner. It has also been automatedand applied to various large-scale projects, as described below. In one method,each bait was screened individually against each prey in a mating assay (105).Another approach was to mate pools of 96 strains expressing the constructs witheach other and screening the positive colonies that result (44). Two-hybrid studiesare often fraught with false positives and false negatives, and on a high-throughputscale these results cannot be validated biologically. The method has been modi-fied to examine protein-DNA interactions, as well as those interactions requiring abridging protein or a post-translational modification; there have been other mod-ifications to the technique that reconstitute a protein activity other than that ofa transcription factor, making the method more amenable for those proteins notsuited to the traditional yeast two-hybrid (18, 98). However, as yet, none of thealternative reporter systems has been shown to work at a proteomic scale.

Protein microarrays are another approach for analyzing large sets of proteins.In principle, protein arrays are extremely versatile and could be adapted for use inmany areas of biology (e.g., lipids, carbohydrates, proteins, nucleic acids) (92). Inthis technique, individually purified ligands such as proteins, peptides, antibodies,antigens, aptamers, carbohydrates, or small molecules are spotted onto a deriva-tized surface and are generally used for examining protein expression levels forprotein profiling and clinical diagnostics (capture chips). It is difficult to producespecific antibodies (or antibody pairs for ELISA-type sandwich assays) for largenumbers of human proteins, and most applications so far have been limited tocytokines where these antibodies exist. Arrays containing recombinant proteinscan also be used to monitor protein activities such as binding properties to smallmolecules, proteins, antibodies, and drugs, and for post-translational modifications(93). One recent study used an antibody array to study the activation of receptortyrosine kinase pathways in tumor cells (72); another used a human protein chip

Page 11: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 277

for antibody screening and serum profiling (60). In addition, the ability to as-say small molecule and drug-binding properties makes this technique attractive topharmaceutical companies for use in drug discovery and validation. Apart fromthe difficulty in expressing large numbers of human proteins in soluble form (16),a main challenge with protein arrays is presenting the proteins in a native state suchthat they can still participate in interactions (31). Additionally, these are in vitroassays and need to be followed up in vivo. An interesting application of proteinmicroarrays is to check the specificity of antibodies by incubating recombinantprotein arrays with them. One group tested eleven monoclonal and polyclonal an-tibodies on a protein chip and found they recognize anywhere between a singletarget (complete specificity) to about 1000 proteins (no detectable specificity) (67).

In transfected cell microarrays, mammalian expression vectors containing full-length cDNAs (or protein domains) are gridded onto a slide that is then treatedwith transfection reagents to complex the DNA with lipids (13, 121). Transfectablecells are then overlaid onto the slide and the DNA is taken up by cells growingon the printed areas, creating colonies of ∼30–80 transfected cells on the slide.These transfected cells can then be assayed for morphological changes, proteinlocalization, binding properties, responses to drug treatment, etc. If the expressedproteins were fused with green fluorescent protein (GFP), then protein localizationcould be followed in response to cellular stimuli such as growth factor treatment.Small interfering RNA (siRNA) constructs could also be arrayed on the chip andtransfected into cells in a high-throughput gene knockdown strategy (121).

Structure and Imaging

Molecular structure has become a powerful tool in elucidating biological mech-anisms but has been performed in a single-protein approach so far. One aim ofstructural proteomics (which thematically is grouped with proteomics but is usuallycalled structural genomics) is to achieve comprehensive coverage of single-proteinor domain structures. Today, the most common technique for structural analysisis X-ray crystallography and this methodology has benefited enormously fromautomation at nearly all stages of the process (1). Nevertheless, protein expressionand crystallization is still difficult and it is not clear if structure determinationcan be scaled up in the same way as other proteomic technologies. Nuclear mag-netic resonance (NMR), 2D electron microscopy (EM), and electron tomographycan also supply important information regarding protein subunit shape, contacts,and proximity. Two-dimensional EM can provide data to ∼7–8 A; however, theprocess is long and laborious and needs to be streamlined and automated for thistechnique to be used in high-throughput projects. X-ray crystal structures providehigh-resolution structures, but the proteins must be available in large quantities andform appropriate crystals. Single-particle EM requires little material but providesonly medium-resolution structures.

Many cellular proteins are involved in higher-order, macromolecular structures,and these structures, are often the entities responsible for carrying out cellular

Page 12: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

278 DE HOOG � MANN

processes. To understand the inner workings of individual cells, we need to under-stand how proteins are oriented in complexes and how this affects their function.The next goal for structural genomics will be to develop hybrid approaches whereboth high-throughput and the maximum resolution are achieved by integrating in-formation from various complementary sources (88). An exciting method is elec-tron tomography, which provides a 3D view of the proteome of single, unstainedcells in a snap-frozen state (66). High-resolution X-ray structures of proteins orsmall protein complexes are then fitted in silico to the relatively low-resolutioncryoelectron microscopy data, presenting information about spatial relationshipsbetween complexes.

Optical imaging with fluorescent fusion proteins continues to develop dramati-cally and now allows dynamic measurement of several tagged proteins in live cells.Protein-protein interactions can also be detected by an imaging technique known asFRET (fluorescence resonance energy transfer). FRET is based on the principle thatenergy from an excited fluorophore can be transferred to an acceptor fluorophorewithin close proximity (<100 A) (83); often the fluorophores are yellow fluores-cent protein (YFP) and cyan fluorescent protein (CFP). One advantage of FRET isthat it can be used in live cells to provide information on proteins in a physiologicalstate and in their correct cellular location and to perform temporal experiments tofollow transient interactions (83). There are many other fluorescent resonance orfluorescence lifetime techniques that could be scaled up. Imaging can even revealthe dynamics of protein activation states in response to cell stimulation (110).

GENE AND PROTEIN EXPRESSION

Much of the information processing performed by the cell is concerned with reg-ulating the amount of active gene product to synthesize in any given situation.Accordingly, the absolute and relative expression level of all gene products isa crucial data set, particularly for disease characterization. Oligonucleotide orcDNA microarrays measure the levels of mRNA as a proxy of the level of ac-tive protein. There is an ongoing discussion about the validity of this assumptionbecause there is a level of biological regulation between transcript and proteinas well as regulation of protein degradation. The cell generally synthesizes moremRNA to increase the level of the corresponding protein. However, there aremany proteins, for example p53, whose activity is regulated by processes suchas rate of degradation or compartmentalization. Measurements of correlation be-tween message and protein on a large scale generally produce moderate correlationcoefficients but this may reflect limitations of current technology as much as dif-ferential regulation (39). It will probably be necessary to assess this correlationseparately for different groups of proteins. This will then aid the interpretationof microarray results, highlight where protein measurements are necessary, andpoint out interesting areas of post-transcriptional gene regulation. Regardless ofthe desirability of direct protein measurements, today mRNA measurements are

Page 13: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 279

technically simpler, achieve higher sensitivity, and can be performed in muchhigher throughput.

One area of intense interest for expression profiling is the serum proteome be-cause it is easily accessible, gene chips cannot measure it, and the serum levels ofproteins can be excellent biomarkers of disease. However, the analytical problemsare daunting, too, as serum is dominated by a few highly abundant protein classesresulting in a dynamic range requirement of up to ten orders of magnitude (9).Albumin and immunoglobulin removal only partially alleviates this problem anddoes so at the expense of interesting bound proteins. So far, published studies haveidentified a few hundred proteins in serum (3, 84, 103). Mass spectrometric serumproteomics is still in very early stages and interlaboratory variance, although notformally determined, seems to be high. An interesting twist of serum proteomicsis to selectively capture glycosylated peptides and then release them from beads(114). This technique vastly simplifies the peptide mixture and retains most se-creted and cell surface proteins because they are glycoproteins. A technique re-cently described for selectively retaining only the N-terminal peptides of proteins(blocked or unblocked) would offer another avenue of drastically simplifying thepeptide mixture to be analyzed (27).

Another serum proteomics technique makes use of MALDI from special sur-faces onto which serum is applied. The resulting spectra are analyzed for patternsthat correlate with clinical features (82, 111). The attraction of this technique is inits simplicity and clinical applicability. However, note that MALDI MS of sam-ples dominated by highly abundant proteins will only yield breakdown productsof these proteins or other highly abundant peptides in the low mass range, whereasmany clinical markers, i.e., cytokines, are exceedingly rare and are not detected.Nevertheless, the idea of obtaining complex pattern information by MS and of sta-tistical techniques to correlate these patterns with disease is extremely attractiveand could potentially revolutionize diagnostic practice, particularly if coupled tostate-of-the-art LC MS/MS techniques.

Obtaining catalogues of proteins expressed in specific cell types and tissueswould be useful for proteomic and other research areas and has been the subjectof much effort in 2D gel-based research over the last decade, albeit with relativelymodest success. Several initiatives have been started under the auspice of HUPOto determine the liver, brain, serum, and other proteomes. Also, Schirle et al. (90)surveyed abundant proteins in widely used cell lines using modern mass spectro-metric techniques. Surprisingly, among 2431 detected proteins, there is an overlapof only 100 proteins between all these cell lines. The same study also shows thatreproducibility in the proteins identified by LC MS/MS runs is generally 60% to70%; therefore, such runs should be repeated three times to achieve saturation.Including some measure of protein abundance into such surveys would furtherincrease their value. Many microorganisms have been subjected to proteomic sur-veys as well. Notably, the publication of the genome of the malaria parasite wasaccompanied by two proteomics studies (23, 56). These studies verified manyprotein coding regions, determined the stage-specific expression of many genes,

Page 14: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

280 DE HOOG � MANN

and identified candidate vaccine or drug targets by sequencing the proteins inmembrane-enriched fractions.

The first nearly comprehensive expression proteomics study was recently pub-lished. Ghaemmaghami et al. (28) tagged all yeast open reading frames with theTandem Affinity Purification (TAP) tag using genomic replacement of endoge-nous genes followed by quantitative Western blotting. Expression levels of 70% ofyeast proteins were determined and showed abundance levels from 50 to 1 millioncopies per cell. This data gives us the first system-wide look at protein abundancesin a eukaryotic cell and will allow superimposition of absolute protein amountsonto other yeast proteome data sets.

Simultaneously, Huh et al. (42) tagged each open reading frame in the yeastgenome with GFP. The proteins were first localized using 12 broad and then 11narrower cellular compartments. The Snyder group had previously pioneered pro-tein localization on a whole proteome scale with a transposon-based strategy (52);however, the new study used genomic replacement with the fusion constructs suchthat endogenous levels of proteins were expressed. Together with the companionstudy, this paper also establishes that at least 80% of proteins are expressed inlog phase, growing yeast cells. Some caveats for both the localization and theprotein expression study are that the tag can sometimes interfere with correct lo-calization and that some membrane proteins might have been under-representedin the Western blots. Furthermore, genomic replacement is only possible in yeast;in other organisms, proteins have to be overexpressed from vectors or stable celllines generated.

In the context of a large-scale antibody generation project, the Uhlen groupproduced antibodies to a large fraction of proteins on chromosome 21 genes. Thesewere used to map distribution of the corresponding genes in a panel of humantissues and cell lines (6). Several groups have used such tissue arrays, consistingof slices of paraffin-embedded tissue rods, to perform immunohistochemistry in alarge-scale format (50), and a novel cell lysate array was also recently employedto investigate protein expression in the NIH cancer cell line compendium (73).

PROTEIN INTERACTIONS AND COMPLEXES

Proteomic Determination of Protein-Protein Interactions

Proteins almost never work completely on their own. They are often members ofmultiprotein complexes, such that it is even difficult to speak of their function ortheir 3D structure outside of this context, as beautifully illustrated by the X-raystructure of the ribosome (19). Apart from forming stable multiprotein complexes,proteins associate transiently with their targets to modify them, regulate them bysteric effects, or translocate them to different cellular compartments. All thesemechanisms are intimately connected to protein function and interaction partnersare among the most useful leads in understanding the biology of a protein. Proteininteractions have been studied by immunoprecipitation and Western blotting or by

Page 15: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 281

protein microsequencing for decades but only recently have researchers begun totackle whole “interactomes”—the totality of protein interactions in a proteome.The most straightforward method is to precipitate tagged proteins and sequenceassociated proteins by MS. As usual, yeast has served as the frontline model or-ganism. In two independent large-scale studies, hundreds of yeast proteins weretagged either by a small epitope tag (40) or by the TAP tag (26), and coprecip-itating proteins were identified by LC MS/MS and MALDI mass fingerprinting,respectively. By its nature, this experiment identifies protein complexes ratherthan binary interactions. A startling outcome was how much of the proteomeis organized into protein complexes. One study focused on signal transductionproteins and proteins involved in DNA repair, revealing an unexpectedly densesubmap of interactions of kinases and phosphatases; in the other study, many yeastproteins with human homologs were targeted (11, 40). Besides yielding interac-tions for about 25% of the yeast proteome, this study also revealed a higher-orderorganization of complexes of complexes, defined via the shared components indifferent pulldowns (26). For metazoans, the TAP tag method has been refinedby including siRNA-mediated downregulation of the endogenous form of the baitprotein. In some cases this is necessary to allow the tagged protein access to thecomplex (24).

The above methods only captured those interactions that have a certain mini-mum affinity and that occur in undisturbed cells. This limitation arises from thetrade-off between stringency of complex purification to avoid background proteinsand loss of specific but low-affinity binders. Quantitative MS has now sidesteppedthis trade-off in the following way: The bait and a closely related, but binding-deficient, control bait are exposed to normal and isotopically labeled cell lysate,respectively. Associated proteins are mixed and analyzed together such that a quan-titative ratio indicates specific binding to the bait but not the control. In principle,this method is independent of the background of nonspecifically binding proteins,which are identified by not showing a significant quantitative change (15, 85). It hasbeen applied to signal-dependent interactions, in which case only one populationof cells is stimulated and the quantitative ratios indicate interactions mediated byphosphorylation (15). In another adoption of the same principle, synthetic peptideswere used as baits to selectively capture interactions of phosphopeptides, whereasthe nonphosphorylated peptide was a control (91).

The yeast two-hybrid system was first applied on a proteome-wide scale inyeast. Recently, this system yielded the first interaction maps of metazoans, namelyDrosophila (30) and C. elegans (58). The Drosophila map was the result of a large-scale effort involving two different screens and selection and sequencing of morethan 60,000 yeast plasmids. Together, this resulted in more than 20,000 interac-tions, which were narrowed down to a high-confidence set of 4780 interactionsbetween 4679 proteins. The density of biologically relevant interactions in thehigh-confidence set was estimated at around 40%. In the C. elegans study, the re-cently constructed set of full-length open reading frames (87), rather than a cDNAlibrary, was used in the screen, resulting in 2157 high-confidence interactions.

Page 16: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

282 DE HOOG � MANN

Comparison to the literature indicated a false negative rate of 90% in this first passdata (58). Data quality was confirmed through coprecipitation of a test set of high-confidence interactors coprecipitating [pulldown with glutathione-S-transferase(GST)-tagged bait and Myc-tagged prey in 293 cells].

In principle, protein chips can also be used to determine protein activities andinteractions. The Snyder group arrayed yeast proteins overexpressed in His-taggedform onto slides to measure enzymatic activity (119). Protein interaction wasassessed with a yeast proteome chip (118): Tagged calmodulin selectively boundto several wells, recapitulating known interaction partners and revealing manynovel ones. The same study also showed that the recombinant proteins could bequeried with phosphoinositols for selective binding.

Phage display allows display of up to 1014 different peptide motifs on the coatprotein of phage particles. The technique is especially suited to determine peptideor domain-protein interactions (95) and, in combination with yeast two-hybriddata, has been used to establish interaction partners of the entire repertoire of yeastSH3 domains (104).

Yaffe et al. (20) used a directed library of synthetic phosphopeptides biased toresemble the motif generated by cyclin-dependent kinases and screened it against680 pools of in vitro–translated 35S-labeled proteins. After repeated subdivisionof pools of binding proteins, two clones were obtained, one encoding the mitotickinase Plk1. This led to the identification of the polo box domain in Plk1 as aphosphothreonine- or phosphoserine-recognizing domain and provided a mecha-nism for Plk1 to localize to the centrosome during mitosis, when proteins contain-ing the phosphopeptide motifs are phosphorylated.

Building Networks

Large-scale interaction data are irresistible to bioinformaticists seeking to deter-mine the topology and abstract features of protein networks. There is much discus-sion about the general topology of protein interaction maps viewed as mathematicalgraphs or networks, particularly whether they are “scale-free” or modular. No clearpicture has yet emerged (30, 89). The position of a protein in the network—i.e.,as a hub, correlates with lethality (46), which is how a network should be laid outfor maximum redundancy and buffering capability. Furthermore, highly connectedproteins may have a slower evolution rate because of coevolutionary constraints(rather than increased essentiality) (25).

A large effort has commenced into validating protein-protein interactions fromlarge-scale projects. It was noticed early on that these data sets have little overlapwith each other even when the same method is used. Part of the reason is thatthe potential interactome is extremely large (6000 × 6000 for yeast and about30,000 × 30,000 for human), and the current screens are by no means saturating.Depending on the method of counting interactions, the overlap between the twoyeast mass spectrometric data sets is up to 40% (107), which is reasonably highconsidering the state of technology. Compared to literature-derived interactions,

Page 17: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 283

many protein interactions have been missed in large-scale studies. The main rea-son for missing true interactions in the MS-based assays is low abundance of theinteraction partner or the fact that binding partners are not expressed under theconditions of the experiment. This is partly being addressed with ongoing devel-opments in sensitivity and sequencing speed as well as with quantitative methods(see above). In the yeast two-hybrid system interactions occur in the nucleus of ayeast cell and between two fusion proteins, which is unfavorable for membraneproteins, for example.

Ingenious methods have been devised to gauge the accuracy of existing datasets apart from benchmarking against the existing literature data sets. For ex-ample, Bader et al. (12) compared yeast two-hybrid and coprecipitation data onthe basis of the nodes between protein pairs in the protein interaction maps.Interaction data has also been overlaid with coexpression data, Gene Ontology(GO) annotation of the proteins, and cellular compartment (see Figure 3). So-called genomic context methods can also predict protein-protein interaction datapurely from DNA sequences (64). Coinheritance of gene order, proximity in thegenome, and gene fusion in another genome are all hints that may point to phys-ical interaction between a protein pair. These methods are powerful in prokary-otes, where they are predicted to exceed homology searches in usefulness soon(43).

Unfortunately, the same is not true in eukaryotes and measures such as coreg-ulation can be questionable indicators of possible binding (12, 58). Intuitively, itmakes sense to combine the interaction data sets. This cannot be done by simpleintersection because the low degree of overlap would mean that most interactionsare lost. However, more sophisticated mathematical methods such as Bayesiannetworks allow merging experimental data or data predicted by genome contextor coexpression association to be merged for a single probability of interaction.In the study of Jansen et al. (45) this approach was applied to the four yeast in-teraction sets, genome context, and coexpression data sets. They conclude that insilico predictions are as accurate—if not as comprehensive—as experimental datasets. However, in their analysis almost all experimental interactions were lost inthe process of Bayesian merging of the data sets.

In the experimental data sets false negative rates are up to 90% (87), and inthe latest data sets false positive rates are still more than 50% even in the high-confidence set of interactions (12). So, what do we make of the large-scale proteininteraction data? We have learned a great deal about the general topology of theinteractome, and the bioinformatic analysis methods developed recently will bevaluable in the future. However, the data sets produced so far will likely be underconstant revision as technology improves. Thus, it will be several years before anyauthoritative protein interaction maps will be available – especially for metazoans.In silico methods will contribute increasingly to protein interactions both in vali-dation and prediction. Time-, signal-, compartment-, and modification-dependentinteractions have hardly been addressed and will keep the proteomics communitybusy for years to come.

Page 18: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

284 DE HOOG � MANN

Organellar Proteomics

It is a long-standing aim of cell biology to characterize the protein components of allcellular structures and organelles. MS is ideally suited to this task because it offersa straightforward way to analyze complex mixtures of proteins with the dynamicabundance range found in organelles. Thus, at least a first pass has been made onmost organelles, including nonmembrane-bound nuclear structures (17, 102). Amain take-home message is that many protein complexes and organelles are unex-pectedly complex. For example, the spliceosome contains upwards of 300 proteins(86, 117) and the nucleolus contains more than 600 proteins (J.S. Andersen, Y.W.Lam, A.K.L Leung, S.E. Ong, C.E. Lyon, A.I. Lamond & M. Mann, unpublishedobservations). Altogether, organellar proteomics has localized several thousandproteins in different structures, providing important clues to their cellular roles.

As with protein interactions, verifying potential members of large protein com-plexes is a major effort. Early on it was established that fluorescence localizationis an ideal companion to mass spectrometric organellar proteomics (71). However,in organisms other than yeast, tagged proteins need to be overexpressed; the tagmay interfere with proper localization and the localization may not be unequivocal.Our laboratory recently devised a method that distinguishes components of pro-tein complexes from others purely based on fractionation behavior (7). In the finalpurification of the complex, we tracked the abundance of every peptide in each su-crose gradient fraction by MS. Only the true organelle components codistributed,whereas the background proteins showed a different distribution. Applied to thehuman centrosome, which is notoriously difficult to purify, this method allowedidentification of essentially all structural components, two thirds of the regulatoryones, and at least 70 strong candidates among a background of approximately 1000other proteins (7).

POST-TRANSLATIONAL MODIFICATIONS

The activity of many proteins is modulated through the addition of covalent modi-fiers (e.g., phosphate groups or ubiquitin moieties) or through proteolytic cleavage.These post-translational modifications (PTMs) can affect protein turnover, local-ization, activity, or binding interactions: Ubiquitin modification targets proteins fordegradation or internalization, farnesylation tethers proteins to membranes, andphosphorylation can activate a kinase or provide docking sites for binding partners.Phosphorylation, acetylation, methylation, glycosylation, ubiquitination, and lipidmodifications are among the most common PTMs; there have been more than 200modifications described (for a listing, see http://www.abrf.org/index.cfm/dm.home).

Because a given modification results in a change in the molecular mass ofthe affected amino acid, MS is the method of choice for characterizing post-translational modifications, with its sensitivity, its high-mass accuracy, and itsability to deal with complex mixtures (63). However, the fraction of peptide bearinga particular modification can be a small fraction of the total amount of peptide

Page 19: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 285

present in a sample, making the identification of a PTM technically challenging,particularly in very complex samples. Peptide mapping with different proteolyticenzymes is often used to cover as much of a protein sequence as possible, and PTMsare then determined using the mass difference generated by the modification (e.g.,a � mass of 204 Da indicates a farnesyl fatty acid modification). If the modificationis stable, then the specific residue(s) on which it is located can be pinpointed usingthe tandem mass spectrum; however, the more labile PTMs, such as phosphoserineand phosphothreonine, can often merely be localized to a particular peptide.

Studying the exact sites of post-translational modifications of protein popula-tions remains challenging; however, if the experiment can be converted to a con-ventional protein identification problem it is relatively easy to solve. For example,if the population of modified proteins can be affinity purified with an antibody ora tag, then the resulting protein mixture can be analyzed and a list of modifiedproteins can be generated. Peng et al. (81) described a strategy to specifically pu-rify ubiquitinated proteins from yeast and to identify some of the modified lysineresidues. Yeast cells expressing a 6xHis-tagged ubiquitin construct were lysed andproteins purified over a nickel resin under denaturing conditions; cells expressingwild-type ubiquitin were also subjected to this purification as a reference sample.The samples were digested and separated in two dimensions of chromatography tosimplify the peptide mixture, and then were analyzed by LC MS/MS. This strategyyielded the identification of 1075 candidate ubiquitin-conjugated proteins and thelocalization of 110 ubiquitination sites on 72 proteins.

MacCoss et al. (62) used a shotgun approach to identify multiple types of PTMsin a single experiment. Human lens tissue was digested with three different pro-teases and the peptides were analyzed by multidimensional LC MS/MS, yieldingnumerous chemically induced and biological modifications.

Phosphorylation

Phosphorylation is by far the most important regulatory modification in signalingresearch. Method development for identifying phosphorylation sites has attemptedto address the issues of substoichiometric and labile phosphorylations. Techniquesinclude precursor ion scanning (99, 101), enrichment of phosphopeptides via im-mobilized metal affinity chromatography (21) or via antibodies (35), and chemicalmodification of the phosphoaminoacid for stabilization and enrichment (33, 74,116).

β-elimination of phosphate from serine or threonine has been used successfullyto map phosphorylation sites (33, 74). The β-elimination reaction is followed byattack of the resulting α,β-unsaturated carbonyl with a nucleophile that allows forenrichment and also results in a “tag” conferring a unique molecular weight to themodified amino acid. Knight et al. (49) used β-elimination to engineer new pro-tease cleavage sites. The resulting peptides were indicative of the phosphorylationsite. The caveat to these methods is that β-elimination reaction is chemically notvery specific.

Page 20: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

286 DE HOOG � MANN

INTEGRATED APPROACHES

Recently, several studies were published that use a combination of different pro-teomic and functional proteomics methods. For example, Mootha et al. (69) usedinformation from mitochondrial proteomics, positional cloning, and coregulationdata from a large collection of mouse microarray experiments to narrow diseasegene candidates down to a single gene (see Figure 4). This gene was mutated in thecarrier family in question and was the cause of Leigh syndrome, French-Canadiantype. In a related effort, a mitochondrial proteomics data set from diverse tissueswas integrated with mouse microarray data to show unexpected tissue specificityin the make up of this organelle (68). The same study also identified a population ofproteins, coregulated with the mitochondrial proteins but not actually sequenced inthe mitochondrial protein populations, including nuclear factors likely to regulatebiogenesis of mitochondria and good candidates for the hitherto elusive humanmtDNA repair enzymes.

Hazbun et al. (38) carried out an interesting experiment on essential, but func-tionally uncharacterized, yeast open reading frames. They performed fluorescentlocalization, protein interaction by the yeast two-hybrid system and by MS, andbioinformatic analysis aimed at identifying weak structural homologs. They re-port that the fluorescent data was particularly reproducible with respect to otherpublished studies; the protein interaction data was less reproducible but offeredthe greatest yield in terms of functional annotation.

OUTLOOK

Proteomics is a vibrant discipline that casts a wide but still specific net to yieldfunctionally relevant answers. Some of the early goals of proteomics, such asknowing the cellular components and the abundance of all the proteins, are closeto being achieved, at least in the yeast model system (11, 28, 42). Protein in-teraction mapping has made great strides, especially on the computational side.Next-generation experimental technologies such as the routine use of quantitativeMS for interaction studies should lead to much cleaner data sets in the future. Thesame technique also allows for capturing modification-dependent interactions aswell as their dynamic nature.

We predict that the role of post-translational modifications will continue toincrease as more are discovered. It is only a question of time until many in vivophosphorylation sites are mapped by MS, although unraveling their significanceand regulation will be a huge undertaking.

Structural biology will continue to yield a steady stream of individual struc-tures; much of the excitement will be in integrating this knowledge with lower-resolution data of higher-order structures and with single-cell pictures of thewhole cell. Large-scale imaging of fluorescent fusion proteins in combination withfluorescence resonance energy transfer experiments will complement organellar

Page 21: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 287

proteomics by MS and allow us to assign a “cellular home” for most mammalianproteins in the near future. This will allow large-scale mapping of the dynamicsof protein movement between compartments.

Developing high-quality sets of reagents is crucial and long overdue. For exam-ple, collections of full-length clones have taken unexpectedly long to materialize.Likewise, many proteomic scale projects stall because antibodies to most proteinsare not available or are of poor quality.

Quantitative MS has nearly limitless applications. In the analysis of affinity pu-rifications, it can be used to eliminate nonspecific interactors and thereby improveconfidence that the identified proteins are not false positives. In cellular signalingexperiments, it can be used to determine which proteins are increased in phos-phorylation after stimulation with a growth factor, for instance. This techniquecan determine the protein complement of normal versus cancerous tissues and canestablish differences due to the tumorigenic phenotype. Quantitative experimentscould be used in the drug discovery field or as a first phase in experimental drugtesting. In a similar manner to DNA microarrays, quantitative proteomics can beused to study protein expression profiles.

Whether proteomics becomes an intellectually coherent discipline, rather thana sprawling agglomeration of separate technologies, is difficult to predict. At aminimum, it is a useful organizing principle that encourages researchers to thinkin an interdisciplinary manner and to venture beyond the universe delimited bythe potential of their favorite tool or process. It already delivers extremely usefulinformation and resources to the life sciences. Proteomics also offers spectacularchallenges and technical progress to match. For all of these reasons we hope andare confident that proteomics will become a key discipline contributing to ourunderstanding of the workings of the cell in health and disease.

ACKNOWLEDGMENTS

Work in the Center for Experimental BioInformatics (CEBI) is supported by agenerous grant from the Danish National Research Foundation. C.L. de Hoog issupported by a fellowship from the Canadian Institutes of Health Research. V.Mootha kindly provided Figure 4.

The Annual Review of Genomics and Human Genetics is online athttp://genom.annualreviews.org

LITERATURE CITED

1. Abola E, Kuhn P, Earnest T, Stevens RC.2000. Automation of X-ray crystallog-raphy. Nat. Struct. Biol. 7(Suppl.):973–77

2. Adam GC, Sorensen EJ, Cravatt BF. 2002.

Chemical strategies for functional pro-teomics. Mol. Cell. Proteomics 1:781–90

3. Adkins JN, Varnum SM, Auberry KJ,Moore RJ, Angell NH, et al. 2002. Toward

Page 22: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

288 DE HOOG � MANN

a human blood serum proteome: analysisby multidimensional separation coupledwith mass spectrometry. Mol. Cell. Pro-teomics 1:947–55

4. Aebersold R, Mann M. 2003. Massspectrometry-based proteomics. Nature422:198–207

5. Aebersold R, Watts JD. 2002. The needfor national centers for proteomics. Nat.Biotechnol. 20:651

6. Agaton C, Galli J, Hoiden Guthenberg I,Janzon L, Hansson M, et al. 2003. Affin-ity proteomics for systematic protein pro-filing of chromosome 21 gene productsin human tissues. Mol. Cell. Proteomics2:405–14

7. Andersen JS, Wilkinson CJ, Mayor T,Mortensen P, Nigg EA, Mann M. 2003.Proteomic characterization of the humancentrosome by protein correlation profil-ing. Nature 426:570–74

8. Anderson DC, Li W, Payan DG,Noble WS. 2003. A new algorithm for theevaluation of shotgun peptide sequenc-ing in proteomics: support vector machineclassification of peptide MS/MS spectraand SEQUEST scores. J. Proteome Res.2:137–46

9. Anderson NL, Anderson NG. 2002. Thehuman plasma proteome: history, charac-ter, and diagnostic prospects. Mol. Cell.Proteomics 1:845–67

10. Bader GD, Donaldson I, Wolting C,Ouellette BF, Pawson T, Hogue CW. 2001.BIND–The Biomolecular Interaction Net-work Database. Nucl. Acids Res. 29:242–45

11. Bader GD, Heilbut A, Andrews B,Tyers M, Hughes T, Boone C. 2003. Func-tional genomics and proteomics: chartinga multidimensional map of the yeast cell.Trends Cell Biol. 13:344–56

12. Bader JS, Chaudhuri A, Rothberg JM,Chant J. 2004. Gaining confidence inhigh-throughput protein interaction net-works. Nat. Biotechnol. 22:78–85

13. Bailey SN, Wu RZ, Sabatini DM. 2002.Applications of transfected cell microar-

rays in high-throughput drug discovery.Drug Discov. Today 7:S113–18

14. Baldwin MA. 2004. Protein identificationby mass spectrometry: issues to be con-sidered. Mol. Cell. Proteomics 3:1–9

15. Blagoev B, Kratchmarova I, Ong SE,Nielsen M, Foster LJ, Mann M. 2003.A proteomics strategy to elucidate func-tional protein-protein interactions ap-plied to EGF signaling. Nat. Biotechnol.21:315–18

16. Braun P, LaBaer J. 2003. High through-put protein production for functionalproteomics. Trends Biotechnol. 21:383–88

17. Brunet S, Thibault P, Gagnon E, KearneyP, Bergeron JJ, Desjardins M. 2003. Or-ganelle proteomics: looking at less to seemore. Trends Cell Biol. 13:629–38

18. Coates PJ, Hall PA. 2003. The yeast two-hybrid system for identifying protein-protein interactions. J. Pathol. 199:4–7

19. Dahlberg AE. 2001. Ribosome structure.The ribosome in action. Science 292:868–69

20. Elia AE, Cantley LC, Yaffe MB. 2003.Proteomic screen finds pSer/pThr-bindingdomain localizing Plk1 to mitotic sub-strates. Science 299:1228–31

21. Ficarro SB, McCleland ML, StukenbergPT, Burke DJ, Ross MM, et al. 2002. Phos-phoproteome analysis by mass spectrom-etry and its application to Saccharomycescerevisiae. Nat. Biotechnol. 20:301–5

22. Fields S, Song O. 1989. A novel geneticsystem to detect protein-protein interac-tions. Nature 340:245–46

23. Florens L, Washburn MP, Raine JD,Anthony RM, Grainger M, et al. 2002. Aproteomic view of the Plasmodium falci-parum life cycle. Nature 419:520–26

24. Forler D, Kocher T, Rode M, Gentzel M,Izaurralde E, Wilm M. 2003. An efficientprotein complex purification method forfunctional proteomics in higher eukary-otes. Nat. Biotechnol. 21:89–92

24a. Foster LJ, de Hoog CL, Mann M. 2003.

Page 23: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 289

Unbiased quantitative proteomics of lipidrafts reveals high specificity for signalingfactors. Proc. Natl. Acad. Sci. USA 100:5813–18

25. Fraser HB, Hirsh AE, Steinmetz LM,Scharfe C, Feldman MW. 2002. Evolu-tionary rate in the protein interaction net-work. Science 296:750–52

26. Gavin AC, Bosche M, Krause R, Grandi P,Marzioch M, et al. 2002. Functional orga-nization of the yeast proteome by system-atic analysis of protein complexes. Nature415:141–47

27. Gevaert K, Goethals M, Martens L, VanDamme J, Staes A, et al. 2003. Explor-ing proteomes and analyzing protein pro-cessing by mass spectrometric identifica-tion of sorted N-terminal peptides. Nat.Biotechnol. 21:566–69

28. Ghaemmaghami S, Huh WK, Bower K,Howson RW, Belle A, et al. 2003. Globalanalysis of protein expression in yeast.Nature 425:737–41

29. Gibbs RA, Belmont JW, Hardenbol P,Willis TD, Yu F, et al. 2003. The Interna-tional HapMap Project. Nature 426:789–96

30. Giot L, Bader JS, Brouwer C, ChaudhuriA, Kuang B, et al. 2003. A protein inter-action map of Drosophila melanogaster.Science 302:1727–36

31. Glokler J, Angenendt P. 2003. Protein andantibody microarray technology. J. Chro-matogr. B Analyt. Technol. Biomed. LifeSci. 797:229–40

32. Godl K, Wissing J, Kurtenbach A, Haben-berger P, Blencke S, et al. 2003. Anefficient proteomics method to identifythe cellular targets of protein kinase in-hibitors. Proc. Natl. Acad. Sci. USA 100:15434–39

33. Goshe MB, Conrads TP, Panisko EA, An-gell NH, Veenstra TD, Smith RD. 2001.dePhosphoprotein isotope-coded affinitytag approach for isolating and quantitatingphosphopeptides in proteome-wide anal-yses. Anal. Chem. 73:2578–86

34. Graves PR, Haystead TA. 2002. Molec-

ular biologist’s guide to proteomics. Mi-crobiol. Mol. Biol. Rev. 66:39–63

35. Grønborg M, Kristiansen TZ, StensballeA, Andersen JS, Ohara O, et al. 2002.A mass spectrometry-based proteomicapproach for identification of serine/threonine-phosphorylated proteins by en-richment with phospho-specific antibod-ies: identification of a novel protein, Frigg,as a protein kinase A substrate. Mol. Cell.Proteomics 1:517–27

36. Gygi SP, Rist B, Gerber SA, Turecek F,Gelb MH, Aebersold R. 1999. Quanti-tative analysis of complex protein mix-tures using isotope-coded affinity tags.Nat. Biotechnol. 17:994–99

37. Hansen KC, Schmitt-Ulms G, ChalkleyRJ, Hirsch J, Baldwin MA, BurlingameAL. 2003. Mass spectrometric analysis ofprotein mixtures at low levels using cleav-able 13C-isotope-coded affinity tag andmultidimensional chromatography. Mol.Cell. Proteomics 2:299–314

38. Hazbun TR, Malmstrom L, Anderson S,Graczyk BJ, Fox B, et al. 2003. Assign-ing function to yeast proteins by integra-tion of technologies. Mol. Cell 12:1353–65

39. Hegde PS, White IR, Debouck C. 2003.Interplay of transcriptomics and pro-teomics. Curr. Opin. Biotechnol. 14:647–51

40. Ho Y, Gruhler A, Heilbut A, Bader GD,Moore L, et al. 2002. Systematic identi-fication of protein complexes in Saccha-romyces cerevisiae by mass spectrometry.Nature 415:180–83

41. Horning S, Malek R, Wieghaus A, SenkoMW, Syka JEP. 2003. A hybrid two-dimensional quadrupole ion trap/fouriertransform ion cyclotron mass spectrom-eter: accurate mass and high resolutionat a chromatography timescale. Presentedat Am. Soc. Mass Spectrom. Conf. MassSpectrom. Allied Top., 51st, Montreal,Canada

42. Huh WK, Falvo JV, Gerke LC, Carroll AS,Howson RW, et al. 2003. Global analysis

Page 24: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

290 DE HOOG � MANN

of protein localization in budding yeast.Nature 425:686–91

43. Huynen MA, Snel B, von Mering C, BorkP. 2003. Function prediction and proteinnetworks. Curr. Opin. Cell Biol. 15:191–98

44. Ito T, Chiba T, Ozawa R, Yoshida M, Hat-tori M, Sakaki Y. 2001. A comprehensivetwo-hybrid analysis to explore the yeastprotein interactome. Proc. Natl. Acad. Sci.USA 98:4569–74

45. Jansen R, Yu H, Greenbaum D, KlugerY, Krogan NJ, et al. 2003. A Bayesiannetworks approach for predicting protein-protein interactions from genomic data.Science 302:449–53

46. Jeong H, Mason SP, Barabasi AL, OltvaiZN. 2001. Lethality and centrality in pro-tein networks. Nature 411:41–42

47. Kasprzyk A, Keefe D, Smedley D, Lon-don D, Spooner W, et al. 2004. EnsMart: ageneric system for fast and flexible accessto biological data. Genome. Res. 14:160–69

48. Keller A, Nesvizhskii AI, Kolker E, Ae-bersold R. 2002. Empirical statisticalmodel to estimate the accuracy of pep-tide identifications made by MS/MS anddatabase search. Anal. Chem. 74:5383–92

49. Knight ZA, Schilling B, Row RH, Ken-ski DM, Gibson BW, Shokat KM. 2003.Phosphospecific proteolysis for mappingsites of protein phosphorylation. Nat.Biotechnol. 21:1047–54

50. Kononen J, Bubendorf L, Kallioniemi A,Barlund M, Schraml P, et al. 1998. Tissuemicroarrays for high-throughput molec-ular profiling of tumor specimens. Nat.Med. 4:844–47

51. Krijgsveld J, Ketting RF, Mahmoudi T,Johansen J, Artal-Sanz M, et al. 2003.Metabolic labeling of C. elegans and D.melanogaster for quantitative proteomics.Nat. Biotechnol. 21:927–31

52. Kumar A, Agarwal S, Heyman JA, Mat-son S, Heidtman M, et al. 2002. Subcel-lular localization of the yeast proteome.Genes Dev. 16:707–19

53. Kuruvilla FG, Shamji AF, Sternson SM,Hergenrother PJ, Schreiber SL. 2002. Dis-secting glucose signalling with diversity-oriented synthesis and small-moleculemicroarrays. Nature 416:653–57

54. Kuster B, Mortensen P, Andersen JS,Mann M. 2001. Mass spectrometry allowsdirect identification of proteins in largegenomes. Proteomics 1:641–50

55. Lander ES, Linton LM, Birren B, Nus-baum C, Zody MC, et al. 2001. Initialsequencing and analysis of the humangenome. Nature 409:860–921

56. Lasonder E, Ishihama Y, Andersen JS,Vermunt AM, Pain A, et al. 2002. Analysisof the Plasmodium falciparum proteomeby high-accuracy mass spectrometry. Na-ture 419:537–42

57. Li J, Steen H, Gygi SP. 2003. Proteinprofiling with cleavable isotope-codedaffinity tag (cICAT) reagents: the yeastsalinity stress response. Mol. Cell. Pro-teomics 2:1198–204

58. Li S, Armstrong CM, Bertin N, Ge H, Mil-stein S, et al. 2004. A map of the interac-tome network of the Metazoan C. elegans.Science 303(5657):540–43

59. Lill J. 2003. Proteomic tools for quanti-tation by mass spectrometry. Mass Spec-trom. Rev. 22:182–94

60. Lueking A, Possling A, Huber O, Bev-eridge A, Horn M, et al. 2003. A nonre-dundant human protein chip for antibodyscreening and serum profiling. Mol. Cell.Proteomics 2:1342–49

61. Lundgren DH, Eng J, Wright ME, HanDK. 2003. PROTEOME-3D: an interac-tive bioinformatics tool for large-scaledata exploration and knowledge discov-ery. Mol. Cell. Proteomics 2:1164–76

62. MacCoss MJ, McDonald WH, Saraf A,Sadygov R, Clark JM, et al. 2002. Shot-gun identification of protein modificationsfrom protein complexes and lens tissue.Proc. Natl. Acad. Sci. USA 99:7900–5

63. Mann M, Jensen ON. 2003. Proteomicanalysis of post-translational modifica-tions. Nat. Biotechnol. 21:255–61

Page 25: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 291

64. Marcotte EM, Pellegrini M, Ng HL, RiceDW, Yeates TO, Eisenberg D. 1999.Detecting protein function and protein-protein interactions from genome se-quences. Science 285:751–53

65. Marshall AG, Hendrickson CL, JacksonGS. 1998. Fourier transform ion cyclotronresonance mass spectrometry: a primer.Mass Spectrom. Rev. 17:1–35

66. Medalia O, Weber I, Frangakis AS, Nicas-tro D, Gerisch G, Baumeister W. 2002.Macromolecular architecture in eukary-otic cells visualized by cryoelectron to-mography. Science 298:1209–13

67. Michaud GA, Salcius M, Zhou F, Bang-ham R, Bonin J, et al. 2003. Analyzingantibody specificity with whole proteomemicroarrays. Nat. Biotechnol. 21:1509–12

68. Mootha VK, Bunkenborg J, Olsen JV,Hjerrild M, Wisniewski JR, et al. 2003.Integrated analysis of protein composi-tion, tissue diversity, and gene regulationin mouse mitochondria. Cell 115:629–40

69. Mootha VK, Lepage P, Miller K, Bunken-borg J, Reich M, et al. 2003. Identificationof a gene causing human cytochrome c ox-idase deficiency by integrative genomics.Proc. Natl. Acad. Sci. USA 100:605–10

70. Nesvizhskii AI, Keller A, Kolker E, Ae-bersold R. 2003. A statistical model foridentifying proteins by tandem mass spec-trometry. Anal. Chem. 75:4646–58

71. Neubauer G, King A, Rappsilber J, CalvioC, Watson M, et al. 1998. Mass spec-trometry and EST-database searching al-lows characterization of the multi-proteinspliceosome complex. Nat. Genet. 20:46–50

72. Nielsen UB, Cardone MH, Sinskey AJ,MacBeath G, Sorger PK. 2003. Profilingreceptor tyrosine kinase activation by us-ing Ab microarrays. Proc. Natl. Acad. Sci.USA 100:9330–35

73. Nishizuka S, Charboneau L, Young L,Major S, Reinhold WC, et al. 2003.Proteomic profiling of the NCI-60 can-cer cell lines using new high-density

reverse-phase lysate microarrays. Proc.Natl. Acad. Sci. USA 100:14229–34

74. Oda Y, Nagasu T, Chait BT. 2001.Enrichment analysis of phosphorylatedproteins as a tool for probing the phospho-proteome. Nat. Biotechnol. 19:379–82

75. Oda Y, Owa T, Sato T, Boucher B, DanielsS, et al. 2003. Quantitative chemical pro-teomics for identifying candidate drug tar-gets. Anal. Chem. 75:2159–65

75a. Olsen JV, Ong SE, Mann M. 2004.Trypsin cleaves exclusively C-terminal toArginine and lysine residues. Mol. Cell.Proteomics http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list uids=15034119

76. Ong SE, Blagoev B, Kratchmarova I,Kristensen DB, Steen H, et al. 2002. Sta-ble isotope labeling by amino acids in cellculture, SILAC, as a simple and accurateapproach to expression proteomics. Mol.Cell. Proteomics 1:376–86

77. Ong SE, Kratchmarova I, Mann M. 2003.Properties of 13C-substituted arginine instable isotope labeling by amino acids incell culture (SILAC). J. Proteome Res.2:173–81

78. Orchard S, Hermjakob H, Apweiler R.2003. The proteomics standards initiative.Proteomics 3:1374–76

79. Paigen K, Eppig JT. 2000. A mouse phe-nome project. Mamm. Genome 11:715–17

80. Peng J, Elias JE, Thoreen CC, LickliderLJ, Gygi SP. 2003. Evaluation of mul-tidimensional chromatography coupledwith tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analy-sis: the yeast proteome. J. Proteome Res.2:43–50

81. Peng J, Schwartz D, Elias JE, Thoreen CC,Cheng D, et al. 2003. A proteomics ap-proach to understanding protein ubiquiti-nation. Nat. Biotechnol. 21:921–26

82. Petricoin EF, Ardekani AM, Hitt BA,Levine PJ, Fusaro VA, et al. 2002. Useof proteomic patterns in serum to identifyovarian cancer. Lancet 359:572–77

Page 26: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

292 DE HOOG � MANN

83. Phizicky E, Bastiaens PI, Zhu H, SnyderM, Fields S. 2003. Protein analysis on aproteomic scale. Nature 422:208–15

84. Pieper R, Gatlin CL, Makusky AJ, RussoPS, Schatz CR, et al. 2003. The humanserum proteome: display of nearly 3700chromatographically separated proteinspots on two-dimensional electrophoresisgels and identification of 325 distinct pro-teins. Proteomics 3:1345–64

85. Ranish JA, Yi EC, Leslie DM, PurvineSO, Goodlett DR, et al. 2003. The studyof macromolecular complexes by quanti-tative proteomics. Nat. Genet. 33:349–55

86. Rappsilber J, Ryder U, Lamond AI, MannM. 2002. Large-scale proteomic analysisof the human spliceosome. Genome Res.12:1231–45

87. Reboul J, Vaglio P, Rual JF, LameschP, Martinez M, et al. 2003. C. elegansORFeome version 1.1: experimental ver-ification of the genome annotation andresource for proteome-scale protein ex-pression. Nat. Genet. 34:35–41

88. Sali A, Glaeser R, Earnest T, Baumeis-ter W. 2003. From words to literature instructural proteomics. Nature 422:216–25

89. Salwinski L, Eisenberg D. 2003. Compu-tational methods of analysis of protein-protein interactions. Curr. Opin. Struct.Biol. 13:377–82

90. Schirle M, Heurtier MA, Kuster B. 2003.Profiling core proteomes of human celllines by one-dimensional PAGE and liq-uid chromatography-tandem mass spec-trometry. Mol. Cell. Proteomics 2:1297–305

91. Schulze WX, Mann M. 2004. A novel pro-teomic screen for peptide-protein interac-tions. J. Biol. Chem. 279:10756–64

92. Schweitzer B, Kingsmore SF. 2002. Mea-suring proteins on microarrays. Curr.Opin. Biotechnol. 13:14–19

93. Schweitzer B, Predki P, Snyder M. 2003.Microarrays to characterize protein inter-actions on a whole-proteome scale. Pro-teomics 3:2190–99

94. Shannon P, Markiel A, Ozier O, Baliga

NS, Wang JT, et al. 2003. Cytoscape: asoftware environment for integrated mod-els of biomolecular interaction networks.Genome Res. 13:2498–504

95. Sidhu SS, Bader GD, Boone C. 2003.Functional genomics of intracellular pep-tide recognition domains with combinato-rial biology methods. Curr. Opin. Chem.Biol. 7:97–102

96. Specht KM, Shokat KM. 2002. Theemerging power of chemical genetics.Curr. Opin. Cell Biol. 14:155–59

97. Speers AE, Cravatt BF. 2004. Chemicalstrategies for activity-based proteomics.Chembiochemistry 5:41–47

98. Stagljar I. 2003. Finding partners: emerg-ing protein interaction technologies ap-plied to signaling networks. 2003. Sci.STKE pe56

99. Steen H, Mann M. 2002. A new deriva-tization strategy for the analysis of phos-phopeptides by precursor ion scanning inpositive ion mode. J. Am. Soc. Mass. Spec-trom 13:996–1003

100. Steen H, Mann M. 2004. The abc’s (andxyz’s) of peptide sequencing. Nat. Rev.Mol. Cell Biol. In press

101. Steen H, Pandey A, Andersen JS, MannM. 2002. Analysis of tyrosine phospho-rylation sites in signaling molecules by aphosphotyrosine-specific immonium ionscanning method. 2002. Sci. STKE PL16

102. Taylor CF, Paton NW, Garwood KL,Kirby PD, Stead DA, et al. 2003. A sys-tematic approach to modeling, capturing,and disseminating proteomics experimen-tal data. Nat. Biotechnol. 21:247–54

103. Tirumalai RS, Chan KC, Prieto DA, IssaqHJ, Conrads TP, Veenstra TD. 2003. Char-acterization of the low molecular weighthuman serum proteome. Mol. Cell. Pro-teomics 2:1096–103

104. Tong AH, Drees B, Nardelli G, BaderGD, Brannetti B, et al. 2002. A combinedexperimental and computational strategyto define protein interaction networksfor peptide recognition modules. Science295:321–24

Page 27: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18) P1: IKH

PROTEOMICS 293

105. Uetz P, Giot L, Cagney G, Mansfield TA,Judson RS, et al. 2000. A comprehen-sive analysis of protein-protein interac-tions in Saccharomyces cerevisiae. Nature403:623–27

106. Venter JC, Adams MD, Myers EW, Li PW,Mural RJ, et al. 2001. The sequence of thehuman genome. Science 291:1304–51

107. von Mering C, Krause R, Snel B, Cor-nell M, Oliver SG, et al. 2002. Com-parative assessment of large-scale datasets of protein-protein interactions. Na-ture 417:399–403

108. Wilkins MR, Pasquali C, Appel RD, OuK, Golaz O, et al. 1996. From proteins toproteomes: large scale protein identifica-tion by two-dimensional electrophoresisand amino acid analysis. Biotechnology14:61–65

109. Winssinger N, Ficarro S, Schultz PG, Har-ris JL. 2002. Profiling protein functionwith small molecule microarrays. Proc.Natl. Acad. Sci. USA 99:11139–44

110. Wouters FS, Verveer PJ, Bastiaens PI.2001. Imaging biochemistry inside cells.Trends Cell Biol. 11:203–11

111. Wulfkuhle JD, Liotta LA, Petricoin EF.2003. Proteomic applications for the earlydetection of cancer. Nat. Rev. Cancer3:267–75

112. Xenarios I, Salwinski L, Duan XJ, HigneyP, Kim SM, Eisenberg D. 2002. DIP, theDatabase of Interacting Proteins: a re-search tool for studying cellular networksof protein interactions. Nucl. Acids Res.30:303–5

113. Zanzoni A, Montecchi-Palazzi L, Quon-dam M, Ausiello G, Helmer-Citterich M,Cesareni G. 2002. MINT: a Molecular IN-Teraction database. FEBS Lett. 513:135–40

114. Zhang H, Li XJ, Martin DB, AebersoldR. 2003. Identification and quantifica-tion of N-linked glycoproteins using hy-drazide chemistry, stable isotope labelingand mass spectrometry. Nat. Biotechnol.21:660–66

115. Zhou H, Ranish JA, Watts JD, AebersoldR. 2002. Quantitative proteome analysisby solid-phase isotope tagging and massspectrometry. Nat. Biotechnol. 20:512–15

116. Zhou H, Watts JD, Aebersold R. 2001.A systematic approach to the analysis ofprotein phosphorylation. Nat. Biotechnol.19:375–78

117. Zhou Z, Licklider LJ, Gygi SP, Reed R.2002. Comprehensive proteomic analy-sis of the human spliceosome. Nature419:182–85

118. Zhu H, Bilgin M, Bangham R, Hall D,Casamayor A, et al. 2001. Global anal-ysis of protein activities using proteomechips. Science 293:2101–5

119. Zhu H, Klemic JF, Chang S, Bertone P,Casamayor A, et al. 2000. Analysis ofyeast protein kinases using protein chips.Nat. Genet. 26:283–89

120. Zhu H, Snyder M. 2003. Protein chip tech-nology. Curr. Opin. Chem. Biol. 7:55–63

121. Ziauddin J, Sabatini DM. 2001. Microar-rays of cells expressing defined cDNAs.Nature 411:107–10

Page 28: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

PROTEOMICS C-1

Figure 1 Schematic of the relationship between the different “omics” disciplines in rela-tion to the flow of information from genome through transcript to protein and small mole-cule. Moving from genomics to proteomics, the complexity increases dramatically whereasthe maturity of the technology decreases.

HI-RES-GG05-10-Hoog.qxd 8/18/04 5:27 AM Page 1

Page 29: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

C-2 DE HOOG ■ MANN

Figure 2 Uses of stable isotope labeling by amino acid in cell culture (SILAC) for quan-titative proteomics. (A) One population of cells is labeled with LeuD3, a deuterium-substituted leucine (blue proteins), and another population is labeled with normal Leu (redproteins). The Leu-labeled cells were treated with a cholesterol-disrupting agent, lysed,combined with lysates of LeuD3-labeled untreated cells, and used to prepare a detergent-resistant fraction. Because rafts in the drug-treated cells lost their structural integrity theyno longer are purified in the detergent-resistant fraction, whereas nonraft contaminantsoriginating from treated and untreated samples will copurify. Tryptic peptides were thenprepared from isolated fractions and analyzed by high-performance liquid chromatogra-phy-mass spectrometry (HPLC-MS); an example MS is shown. The heights of the labeledpeaks can be used to calculate the difference in abundance between the two samples.(B) Examples of extracted ion chromatograms (XIC) for peptides from flotillin 1 and �-tubulin, two proteins identified in this study. Figure adapted from Reference 24a.(Copyright 2003 National Academy of Sciences, U.S.A.)

HI-RES-GG05-10-Hoog.qxd 8/18/04 5:27 AM Page 2

Page 30: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

Figure 3 Analysis of the current C. elegans interaction map network. Nodes (represent-ing proteins) are colored according to their phylogenic class: ancient (has ortholog in yeast;red), multicellular (has ortholog in Drosophila, Arabidopsis, or humans; yellow), and worm(no detectable ortholog outside C. elegans; blue). Edges represent protein-protein interac-tions. Interactions are colored according to high confidence (“core”) and lower confidence(“noncore”) in a large-scale yeast two-hybrid screen. “Scaffold” denotes data from previ-ous small-scale yeast two-hybrid screens. The inset highlights a small part of the network.Figure from Reference 58. (Copyright 2004 American Association for the Advancement ofScience, U.S.A. Reproduced with permission.)

PROTEOMICS C-3

HI-RES-GG05-10-Hoog.qxd 8/18/04 5:27 AM Page 3

Page 31: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)

Figure 4 Integrated approaches. Using data from proteomics, genomic positioning datafor human diseases, and data from DNA microarray analysis, the field of disease gene can-didates can be narrowed down from many potential choices to a single candidate gene.This strategy was successfully applied to the discovery of a gene causing Leigh’s syn-drome, French-Canadian type (69).

C-4 DE HOOG ■ MANN

HI-RES-GG05-10-Hoog.qxd 8/18/04 5:27 AM Page 4

Page 32: PROTEOMICS Carmen L. de Hoog and Matthias Mannfaculty.fiu.edu/~noriegaf/refereces Proteomics web page...18 Aug 2004 4:48 AR AR223-GG05-10.tex AR223-GG05-10.sgm LaTeX2e(2002/01/18)