why proteomics? - biostatistics - departments - johns...

30
1 Applications of Mass Spectrometry to Proteomics Raghothama Chaerkady, Ph.D. McKusick-Nathans Institute of Genetic Medicine and the Department of Biological Chemistry Why Proteomics? One Gene, Many Proteins Gene mRNA Protein Gene mRNA 1 mRNA 2 mRNA 3 mRNA 4 mRNA 5 Alternative splicing Protein 1 Protein 2 Protein 3 Protein 4 Protein 5 Protein 6 .... Protein i Co/post-translational processing

Upload: dodung

Post on 25-Sep-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

1

Applications of Mass Spectrometry to Proteomics

Raghothama Chaerkady, Ph.D.

McKusick-Nathans Institute of Genetic Medicine and the Department of Biological Chemistry

Why Proteomics?

One Gene, Many Proteins

Gene mRNA Protein

Gene

mRNA1mRNA2mRNA3mRNA4mRNA5

Alternativesplicing

Protein1Protein2Protein3Protein4Protein5Protein6....

Proteini

Co/post-translational processing

2

Why Mass Spectrometry?

http://nobelprize.org

Protein Characterization by Mass Spectrometry

• Very sensitive: Less than 1 pmol of protein is required for identification (50 ng of a 50 kDa protein)

• Mixtures of proteins (hundreds to thousands of proteins) can be analyzed

• Proteins with a blocked N terminus can be identified• Post-translational modifications such as phosphorylation

can be identified and localized• Quantitative mass spectrometry has numerous

applications

3

Electrospray Tandem Mass Spectrometry (MS/MS)

• MS1 Scan - Detection of all the peptides in a mixture

• MS/MS - When one of the peptides is isolated and subjected to collision induced dissociation for sequencing

• Tandem mass spectrometry provides the actual sequence of the peptide

Peptide Sequencing by MS/MS

Select one peptide

Fragments containing Carboxyl-terminus

Y7

Collision

Y6

Y5

Y4

Y3

Rel

ativ

e In

tens

ity (%

)

m/z

MS/MS Spectrum

4

Biological changes(morphogenesis)

Profiling of proteins

Proteome ‘n’Proteome 1

Proteome 1 Proteome 2 Proteome ‘n’Proteome 3Proteome 4

Genomics and Proteomics

mRNA and Protein Correlation

insightNature 422, 198-207

Generic mass spectrometry (MS)-based proteomics experiment.

5

Mass spectrometers used in proteome research.

In reflector time-of-flight (TOF) instruments, the ions are accelerated to high kinetic energy and are separated along a flight tube as a result of their different velocities. The ions are turned around in a reflector, which compensates for slight differences in kinetic energy, and then impinge on a detector that amplifies and counts arriving ions.

6

b, The TOF-TOF instrument incorporates a collision cell between two TOF sections. Ions of one mass-to-charge ( m/ z) ratio are selected in the first TOF section, fragmented in the collision cell, and the masses of the fragments are separated in the second TOF section

Quadrupole mass spectrometers select by time-varying electric fields between four rods, which permit a stable trajectory only for ions of a particular desired m/z. Again, ions of a particular m/z are selected in a first section (Q1), fragmented in a collision cell (q2), and the fragments separated in Q3. In the linear ion trap, ions are captured in a quadruple section, depicted by the red dot in Q3. They are then excited via resonant electric field and the fragments are scanned out, creating the tandem mass spectrum

The quadrupole TOF instrument combines the front part of a triple quadruple instrument with a reflector TOF section for measuring the mass of the ions.

7

The (three-dimensional) ion trap captures the ions as in the case of the linear ion trap, fragments ions of a particular m/ z, and then scans out the fragments to generate the tandem mass spectrum.

The FT-MS instrument also traps the ions, but does so with the help of strong magnetic fields. The figure shows the combination of FT-MS with the linear ion trap for efficient isolation, fragmentation and fragment detection in the FT-MS section.

During ion detection, both the central electrode and deflector are maintained at very stable voltages so that no mass drift can take place. The outer electrode is split in half at z=0, allowing the ion image current in the axial direction to be collected. The image current on each of half of the outer electrode is differentially amplified and then undergoes analog-to-digital conversion before processing using the fast Fourier transform algorithm.

8

MS/MS spectrum (sequencing)

Micromass QTOF Finnigan LCQ DecaLTQ-Orbitrap XL

1D Liquid Chromatography Setup

Manual 3-port valve for peak ”parking valve”

Peptide Mixture from pump

diverter valve

Analytical column

C18

Split line

Emitter

8µm tip

80nL/min with valve open

9

A matchstick Needle with a very small opening and containing a reverse phase Column for separation

Multidimensional Liquid Chromatography (MudPIT) Setup

Liu, H. et al. (2002)

MS

Offline Fractionation in the First Dimension

Pump

Auto Sampler

Manual Injector

UV Detector

Reverse phase LCMS

Fraction Collector

SCX Column

10

Automated nanoLC-MS/MS

• Purification of sample• Very complex samples

can be analyzed• Identical peptides will

elute in one peak (enhanced signal)

• Automated• Many proteins can be

identified in one run (100-2,000 proteins)

200 400 600 800 1000 1200 1400 1600 18000

100

50

867.50y9

422.26y

4

351.22y3

709.42y7

622.38y6

535.35y 5

810.47y

8

1388.75y 13

980.58y 10

1287.70y12

1127.66y11

1551.85y14

1800.021698.94y16y15

y2

294.20

m/z

A N T F Y T C F L G T S S L A G F K

100 200 300 400 500 600 700 800 900 1000 1100 1200m/z

0

100

50

V G C S M W Y W K333.21

y 2

1117.52y8

496.29y3

900.46y6

682.38y4

813.41y5

1060.48y7

30 40 50 60 70 80 90 100 110 120Time (minutes)

0

100

50

TIC: Total ion chromatogram

MS/MS spectrum MS/MS spectrum

Coupling Liquid Chromatography to Tandem MS

Quantitative Proteomics

• Based on difference in intensity

• By relative quantitation using MS

11

Quantitative Proteomics

• Based on difference in intensity– 1D gels– 2D gels– Use of fluorescence-based methods

1D Gel-based Comparison

200

110

73

47

28

IP : anti-pTyr

EGF :

2D Gel-based Comparison

Normal Cancer

12

2D Gels - Limitations

• Sample preparation - lot of optimization required

• Loading capacity limited• Do not resolve very small (<10 kD) or

large (>100 kD) proteins• Do not resolve hydrophobic (e.g.

membrane) proteins• Issues with reproducibility

Quantitative Proteomics

• Fluorescence-based quantitation– DIGE (Difference in-gel electrophoresis)

DIGE

• Samples to be labeled are labeled with Cy3 (green) and Cy5 (red)

• Samples are ‘mixed’ and resolved by 2D gels

• Fluorescence measured and quantitated

13

2D-DIGE of Pancreatic Juice

Cancer

Normal

Cy5

Cy3

Cy3-Cy5

Quantitative Proteomics

• By relative quantitation using MS– in vitro labeling

• 18O-labeling• Peptide mass tagging (ICAT)

– in vivo labeling• Labeling with stable isotope containing amino

acids (SILAC)

18O-labelingnormal cancer

Excise gel band Excise gel band

In-gel digestion with H2

16O waterIn-gel digestion with H2

18O water

Mix 1:1

MS-analysis

N C

14

18O-labeling

Isotope-Coded Affinity Tag (ICAT)

1 2 3

4Structure of ICAT reagent

Stable Isotope Labeling in Cell Culture (SILAC) for Protein Quantitation

• Mammalian cell culture models are used for studying a number of biological processes

• In the SILAC approach, cells are grown continuously in media containing one or more stable isotopes (e.g. 13C). All the proteins in the cells are heavier and can be used to ‘mark’ a given state in mass spectrometric analysis

15

Time Course of Heavy Amino Acid Incorporation

Advantages of the SILAC Method

• Simple• In vivo• Does not require any extra processing steps• All proteins are uniformly labeled• Complete and predictable incorporation• Choice of labeled amino acids• De novo sequencing can be performed

General strategy for stable isotope labeling by amino acids

16

12C6 Labeled Lysine

Mix and run on SDS-PAGE

Elute, Digest with Trypsin and Analyze by MS

SILAC for Quantitation of Secreted Proteins

m/z

Rel

ativ

e In

tens

ity

13C6 Labeled Lysine

Day 1

Day 4

Day 1

Day 9

Day 1

Day 7

MixDays 1+4

MixDays 1+9

MixDays 1+7

Profile of Proteins Secreted by Adipocytes

m/z770 772 774 776 778

0

100772.40

772.90

775.42

773.41

773.91

775.90

776.41

50

Ratio 1:0.5Ratio 1:0.4

m/z770 772 774 776 778

0

100772.39

772.88

775.39773.39

773.88

775.89

776.40

50

Ratio 1:0.24

m/z770 772 774 776 778

0

100 772.40

772.91

773.40775.40

775.91776.41

50

m/z800 802 804 806 808

0

100 805.44

802.43802.94

803.43

805.94

806.45

50

Ratio 1:2.4

m/z800 802 804 806 808

0

100 805.42

802.40802.92

803.40

805.91

806.42

50

Ratio 1:2.7

m/z800 802 804 806 8080

100 805.44

802.43

802.92803.43

805.95

806.4650

Ratio 1:1.6

Fibronectin expression is downregulated

Collagen alpha 3 expression is upregulated

17

Studying Dynamics Using SILAC

786.2

786.7

787.3

788.2

789.2

789.7790.2790.7

791.2

791.7

792.2

792.7

793.1

0.00

0.25

0.50

0.75

1.00

1.25

1.50

5x10

786 788 790 792 794 796 798 m/z

VSHLLGINVTDFTR6Da 4Da

Inte

nsity

Biomarker Discovery Using Proteomics

• Ideal targets for biomarker– Protein (differential expressed proteins)– DNA (mutations, methylation)– RNA (differential expressed genes)

• Biological specimen– Tissue (whole tissue or isolated tumor cells)– Pancreatic juice– Serum– Plasma

Pancreatic juice

18

Physiological proteome of human pancreatic juice

• Collection of pancreatic juice (cancer) during surgery

• Run on 1D gel• LC-MS/MS analysis• Bioinformatic analysis of

identified proteins• Compare data with

known microarray data

Differential Proteomic Analysis of Pancreatic Cancer Secretome

• In-vivo labeling with both arginine and lysine

• Collect conditioned media and concentrate with centricon 3,000 Da MWCO

• Normalize and mix in 1:1 ratio• Resolve proteins by 1D gel

electrophoresis• Excise bands and digest proteins

by trypsin• Identify proteins by nanoLC-MS/MS

(2x30 bands)• Verify identified proteins (manually)• Relative quantitation of ID proteins

(manually)

Quantitation of Secreted Proteins

19

Post-translational Modifications

• Peptides can have a number of modifications• During database searching, a variable

modification has to be specified – otherwise, no ‘hit’

• Common PTMs are phosphorylation, acetylation, ubiquitination, glycosylation etc.

Protein Phosphorylation

• One-third of all cellular proteins are phosphorylated at one time or another

• Phosphoamino acid content of a vertebrate cell:Serine - 90%; Threonine - 10%; Tyrosine - 0.05%

• Ser:Thr:Tyr - 1800:200:1• Tyrosine phosphorylation is tightly regulated

HBEC HBECL858R

HBECDel 746-750

Light K/D4 and R/13C6 K/13C6,15N2 and

R/13C6,15N4

Mix lysate, enrich tyrosine phosphorylated proteins

Elute, digest with trypsin, analyze by mass spectrometry

Identifying Activated Tyrosine Kinases in Lung cancer

20

Cell lysatein urea buffer

Trypsin digestion

Sep-Pak C18Reverse-phase

purification

Purified peptides

LyophilizationImmunoaffinity

Purification(Anti-acetyl lysine-Protein A agarose)

LC-MS/MS

Elution using 1% acetic acid

Clean up peptides using C-18 tip

Anti-acetyl lysine peptide IP

GSK3 beta

GSK3 beta

22

Why is Phosphorylation Analysis Difficult?

• Stoichiometry of phosphorylation is low• Complete coverage of proteins is difficult to obtain • Phosphorylated serine and threonine residues are labile

whereas phosphotyrosines are more stable • Phosphoserines and phosphothreonine residues can be

subjected to a beta-elimination reaction but not phosphotyrosine residues

• Antibodies to enrich for serine and threonine phosphorylated proteins are not available

• Phosphopeptides are ‘suppressed’ in a mass spectrum

Phosphotyrosine Immonium Ion (216.043 Da)

NH CH

CH2

O

PHO OHO

C

OHN CH2

CH2

O

PHO OHO

Immonium Ion of Phosphotyrosine as a Reporter Ion

+

Phosphotyrosine-Specific Precursor Ion Scanning – Bcr/Abl

97

200

68

43

28 ******

500 700 900

590 600 610 608 616 624

Phosphotyrosine-Specific Precursor Ion Scan

MS Scan

Con

trol

p185

Bcr

/Abl

IP: anti-pTyr

538 540 542 544

23

Large scale IP with an anti-pSer/Thr antibody

WB: Anti-pSer/Thr

Lysates

Calyculin A:

In Vivo Labeling with 32P

IP: Anti-pSer/Thr

Calyculin A: - +

10RLpSPAPQLGP19

Identification of Phosphorylated Ser/Thr residues

24

200

110

73

47

28

IP : anti-pTyr

EGF :

MS-Based Identification of a 130 kDa Protein in the EGF Receptor Signaling Pathway

Silver stained gel

36QELTKKLMPNCKBlock of proliferation 1 40TKAVSRH2A histone family, member Z 41LKKVENIKPutative homeodomain transcription factor 143KNQLVQKRBP-binding zinc finger protein, c54KGKPQVQDKVVK Hypothetical protein [gi|89059164]30GKPEVVSVVGRImmunoglobulin superfamily, member 940GLGKGGAKRHistone cluster 1, H4a

ScorePeptideProtein name

Partial list of acetylated lysine containing peptides identifiedin an IP experiment

25

>KIAA0229 (1180 residues) FRAGMENT

SWGKGREGVVSPAGLGGALPGDGKFGSPSRLGCSLGEGVQRVAALGMGKEQELLRAARTGHLPAVEKLLSGKRLSSGFGGGGGGGSGGGGGGSGGGGGGLGSSSHPLSSLLSMWRGPNVNCVDSTGYTPLHHAALNGHHRRSSSSRSQDSAEGQDGQVPEQFSGLLHGSSPVCEVGQDPFQLLCTAGQSHPDGSPQQGACHKASMQLEETGVHAPGASQPSALDQSKRVGYLTGLPTTNSRSHPETLTHTASPHPGGAEEGDRSGAR

Assignment of the initiator methionine in a cDNA ‘fragment’ based on an N-terminal peptide

>KIAA0229 (1180 residues) FRAGMENT

SWGKGREGVVSPAGLGGALPGDGKFGSPSRLGCSLGEGVQRVAALGMGKEQLLRAARTGHLPAVEKLLSGKRLSSGFGGGGGGGSGGGGGGSGGGGGGLGSSSHPLSSLLSMWRGPNVNCVDSTGYTPLHHAALNGHHRRSSSSRSQDSAEGQDGQVPEQFSGLLHGSSPVCEVGQDPFQLLCTAGQSHPDGSPQQGACHKASMQLEETGVHAPGASQPSALDQSKRVGYLTGLPTTNSRSHPETLTHTASPHPGGAEEGDRSGAR

Assignment of the initiator methionine in a cDNA ‘fragment’ based on an N-terminal peptide

RCH3 C

O

H2N Q L LG K E

Acetylated lysineTri-methylated lysine

128.095

Mono-methylated lysine

142.111

Di-methylated lysine

156.127 170.143 170.105Mass (Da)

Massgain

- 14.016 Da 28.032 Da 42.048 Da 42.010 Da

NH 2

CO CH 3

Lysine

NH 2 CH CO

CH 2

CH 2

CH 2

CH 2

NH 2

OHNH 2 CH C

OCH 2

CH 2

CH 2

CH 3

OHNH 2 CH C

OCH 2

CH 2

CH 2

CH 3

OHNH 2 CH C

OCH 2

CH 2

CH 2

CH 2

NH

OH

CH 3

NCH 3 CH 3

NH 2 CH CO

CH 2

CH 2

CH 2

CH 3

OH

N+

CH 3CH 3

CH 3

Lysine Modifications

26

Acetylation vs Tri-methylation: a numbers game…

Tri-methylation: 42.046950 Da [H(6) C(3)]Acetylation: 42.010565 Da [H(2) C(2) O]

Mass difference: 0.036385 Da

739 740 741 742 743

m/z

100740.5

742.9

743.6

739.3738.2 738.5

739.7

740.1

740.5

741.0

Zoom scan-scale is the same!

100

739.0 740.0 741.0 742.0

739.76

740.26

740.76

741.26

100

Normal ion trap Quadrupole-Time-of-Flight Orbitrap

Resolution of various mass spectrometers

<3,000 8,000-14,000 30,000-60,000Approx. resolution in ppm

538.0 539.0 540.0 541.0 542.0 543.0 544.0 545.00

4

8

12

16

20

24

28

32

36

40

44

48

52

56

60544.3357

541.6711544.6752

542.0114539.3308538.9988

542.3391

539.6641

545.0024544.8060542.7000541.3070 545.3044544.0053

543.7857

A triple state SILAC experiment: qTOF+2x4 Da + 2x4 Da (2x Lysine containing peptide)

27

molina_06dec07_Udyan_SILAC_pY-pep-ip #4485 RT: 28.81 AV: 1 NL: 7.21E6

F: FTMS + p NSI Full ms [375.00-1800.00]

538.0 538.5 539.0 539.5 540.0 540.5 541.0 541.5 542.0 542.5 543.0 543.5 544.0 544.5 545.0 545.5

m/z

0

10

20

30

40

50

60

70

80

90

100 544.34544.68

541.68539.00542.02

539.34

540.32 542.35541.65539.67

540.66

541.99

538.32 545.01

542.69538.65 541.35

544.01

540.00

544.54

542.32 545.35540.99543.96

545.03

542.78 543.32542.65544.79

540.51

540.71

538.34

541.32

539.63539.10

538.97

538.28 539.81

544.29

538.90

Rel

ativ

e A

bund

ance

A triple state SILAC experiment: Orbitrap

538.0 538.5 539.0 539.5 540.0 540.5 541.0 541.5 542.0 542.5 543.0 543.5 544.0 544.5 545.0 545.5

m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e A

bund

ance

544.34

541.68539.00

540.32 541.65

538.32

+6 Da +4 Da (Arginine containing peptide)

+2x4 Da + 2x4 Da (2x Lysine containing peptide)

A triple state SILAC experiment: Orbitrap

Identifying acetylation sites using Orbitrap

• Digest protein mixture into peptides• Capture acetylated lysine containing peptides

using anti-acetyl lysine antibodies• Analyze on Orbitrap• If the samples are labeled for quantitative

proteomics, one can also obtain data on differential regulation of acetylation at individual lysine residues in this strategy

28

Cell lysate in urea buffer

Trypsin digestion

Sep-Pak C18Reverse-phase

purification

Purified peptides

Lyophilization

ImmunoaffinityPurification

(Anti-acetyl lysine-Protein A agarose)

LC-MS/MS

Elution using 1% acetic acid

Clean up peptides using C-18 tip

Peptide Capture using anti-acetyl lysine antibody

Measured mass: 606.3452

606.2 606.4 606.6

606.40095 (3x tri-met.)

606.38276 (1x Acetyl. + 2x tri-met.)

606.36457 (2x Acetyl. + 1x tri-met.)

606.34638 (3x Acetyl)

GGKGLGKGGAKR peptide from Histone Cluster 1, H4a:What are the PTMs?

molina_06dec07_Lys-acetyl-pep-ip2 #3001 RT: 22.47 AV: 1 NL: 1.54E4T: ITMS + c NSI d Full ms2 [email protected] [155.00-1225.00]

200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

597.11

757.33530.26

927.41664.26

455.19549.38

682.22279.09 1037.32473.26

700.37345.26 870.46587.70285.14 427.26 491.96 1019.38849.34416.34342.20 778.36175.02 267.00 816.10625.48 959.02 1064.55998.22209.11 910.41 1197.391123.85

molina_06dec07_Lys-acetyl-pep-ip2 #3003 RT: 22.48 AV: 1 NL: 5.43E6T: FTMS + p NSI Full ms [375.00-1800.00]

605.8 606.0 606.2 606.4 606.6 606.8 607.0 607.2 607.4 607.6 607.8 608.0 608.2m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

606.3452 (theoretical: m/z 606.3477)

606.8466

607.3477

Answer: Acetylation at each of 3 lysine residues: GGK*GLGK*GGAK*R

29

Di-glycyl lysine

NH 2 CH CO

CH 2

CH 2

CH 2

CH 2

NH

OH

Gly

Gly

Isopeptide bond

Lysine residue

Lysine Modifications

Massgain

114.05 Da

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400m/z

0

100

50

175.12

232.14

1016.51

717.35

589.30347.17 476.21

959.49

1234.621087.55

1347.72

Tryptic Digest of Ubiquitylated Peptide

LIFAGKQLEDGRGG

242.14 Da

GG

K

100 110 120 130 140 150 158

m/z

0

1000

100 120.07

112.08 116.06

158.09

141.10129.10

121.08

129.10

120.07

112.08

158.09

Di-glycine Containing Peptide

Non Di-glycine Containing Peptide

Signature of a Ubiquitylated Peptide

Immonium ion

30

Genome Annotation

- A case for proteomics-driven annotation of protein-coding regions

Genome Annotation by Mass Spectrometry: What Can We Gain?

• Assigning start codons• Proteins isoforms (alternative splicing, novel

exons)• Novel genes (proteins less than 100 amino

acids not predicted by programs)• cSNPs• Correction of incorrect gene predictions (~50%

of the genes in human are predicted)• Validation of gene predictions

Transcripts and Proteins in Anopheles gambiae

• Proteins ~ 700 known• Annotated genes: 15,189 (11,757 predicted)

7,840 unique to predictions by Celera1,375 unique to predictions by Ensembl5,974 common to both predictions

• Gomez et al. (Genome Biology: 6:R39, 2005) 35,000 full-length enriched cDNAs3,700 genes of which only 650 are novel

• Krisventseva et al. (Genome Research, 2005)215,634 ESTs/cDNAs7,961 clusters of which 3,100 are novel

31

Use of Ensembl Distributed Annotation System to Validate a Known Transcript

Schematic representation of the systems biology paradigm involving proteomics.