experimental & bioinformatic tools for proteomics steve oliver professor of genomics faculty of...

100
Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester http://www.cogeme.man.ac.uk http://www.bioinf.man.ac.uk

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Experimental & Bioinformatic Tools for Proteomics

Steve Oliver

Professor of GenomicsFaculty of Life Sciences

The University of Manchester http://www.cogeme.man.ac.uk

http://www.bioinf.man.ac.uk

Page 2: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Functional Genomics

Level of Analysis Definition Status Method of Analysis

Genome Complete set of genes of an organism or its organelles.

Context-independent (modifications to the yeast genome may be made with exquisite precision.

Systematic DNA sequencing.

Transcriptome Complete set of mRNA molecules present in a cell, tissue or organ.

Context-dependent (the complement of mRNAs varies with changes in physiology, development or pathology.

Hybridisation arrays.

SAGE

High-throughput Northern analysis.

Proteome Complete set of protein molecules present in a cell, tissue or organ.

Context-dependent. 2-D gel electrophoresis. Peptide mass fingerprinting.

Two-hybrid analysis.

Metabolome Complete set of metabolites (low molecular weight intermediates) present in a cell, tissue or organ.

Context-dependent. Infra-red spectroscopy.

Mass spectometry.

Nuclear magnetic resonance spectometry.

Page 3: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

GENOME

TRANSCRIPTOME

PROTEOME

METABOLOME

Page 4: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Proteomics

Separation

Identification

Quantitation

Bioinformatics

Page 5: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

knowledge+ prediction

post-translational modification

separationmethods

simple peptidemap fingerprintsimple peptide

map fingerprint

complex peptidemap fingerprint

complex peptidemap fingerprint

BioinformaticsIdentification

[digest]

[digest]

“virtual” proteome“virtual” proteome

real proteomereal proteome

simple mixtures& single proteinssimple mixtures& single proteins

complex mixtures& subsets

complex mixtures& subsets

Complex mixture analysisComplex mixture analysis

2D-gels,functional

separations,n-dimensional

chromatography

2D-gels,functional

separations,n-dimensional

chromatography

genomegenome

peptide mass database

peptide mass database

Page 6: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Aberdeen PRF1: S. cerevisiae 2D map4.0 4.5 5.0 5.5 6.0 6.5

+SSE1

+SSC1

+ SSB1+

VMA1

ADE5,7+

ADE6

HIS4

+

+

CDC48+

+SSA2

ABP1+

+HSP60

+ +PUB1VMA2

PDR13

+

+

+

ATP2

LYS9

SAM2

+ +

SAM1

+ +

ADO1+

SGT2

+CLC1,BGL2

+

EFB1

+

Ykl056c

+

HYP2 ++

RPS0B

+

EGD2FBA1

+

+NTF2

+

+

+

FPR1

PFY1

RPS21

++

AHP1TSA1

+

COF1

+ +HXK2TIF3

+

STI1

ALD6

+ +

+GDH1

+ARG1

ACT1+

+IPP1

+RHR2

+

+

+

+

ASC1

TPI1 TPI1

RIB3

SOD1

+

ADK1+

+

+

PDC1

ENO2

+ +

FBA1

+ +

ENO2

+ +

+

LEU1

PAB1

PDC1

+

+

+

+

MET17

MET6

CYS3

PSA1

+CYS4

+

+

+

ADH1

ILV5

TDH3

+

PST2

+SSA1

+WTM1+

ASN2

+Yfr044c

+

+GLK1,ARO8

+SES1

PDC1+

YHB1+

+OYE2

ENO1+

+PGK1?

+BMH2

+PDB1+

+BMH1+SEC53

+Ylr301w

+VMA4

+SPE3

FBA1

+

ENO2+

+ ENO2+

+

ENO2+URA1 +ADH1

HXK1

+CDC19

+ +PDC1

PGK1

+TPM1

RPS0A

FBA1+

HSP26+

+

BNA1

+MGE1

+TDH3

+EGD1

RIB4+ CPH1

+

+RPL22A

150100

90

80

70

60

50

40

30

20

10

Page 7: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Peptide mass fingerprinting

denature KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCLPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV

KETAAAK FER QHMDSSTSAASSSNYCNQMMK SR NLTK DR

CLPVNTFVHESLADVQAVCSQK NVACK NGQTNCYQSYSTMSITDCR

ETGSSK YPNCAYKTTQANK HIIVACEGNPYVPVHFDASV

m1 m2 m3 m4 m5 m6

digest (trypsin)

m7 m8 m9

m10 m11 m12

mass spectrometry

mass

abun

danc

e

m10

m1

m11 m12m9

m7

Page 8: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Proteomic applications

• Quantitative Proteomics– “Expression” proteomics

• protein levels under different conditions/times

• Qualitative Proteomics– Identification proteomics

• protein:protein interactions• post-translational modifications

Page 9: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

““A MASS SPECTROMETER A MASS SPECTROMETER MEASURES THE MW….”MEASURES THE MW….”

““......A A MS ANALYSIS GIVES MS ANALYSIS GIVES THE MASS-TO-CHARGE RATIO (THE MASS-TO-CHARGE RATIO (m/zm/z) )

FOR IONS…IN GAS PHASE”.FOR IONS…IN GAS PHASE”.Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 10: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

What is a “mass spectrometer”...?What is a “mass spectrometer”...?

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 11: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

DIRECT DIRECT INTRODUCTION INTRODUCTION

(solid, liquid, gas) (solid, liquid, gas) SEPARATION SEPARATION

TECHNIQUES TECHNIQUES (HPLC, CE, GC)(HPLC, CE, GC)

ION SOURCEION SOURCE

(“ion generation”)(“ion generation”)

vacuumvacuum

Pumping Pumping systemsystem

Sample Sample introductionintroduction

Data Data ProcessingProcessing

ANALYZER ANALYZER

(“mass analysis”)(“mass analysis”)

DetectorDetector

EI, FAB, EI, FAB, MALDI,ElectrosprayMALDI,Electrospray

TOF, quadrupole, ion trapTOF, quadrupole, ion trap

Brancia FL , Trieste, 12/02/2004Brancia FL , Trieste, 12/02/2004

Page 12: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Various ionisation methodsVarious ionisation methods

• Electron impact ionisation (1919 A.J. Dempster)Electron impact ionisation (1919 A.J. Dempster)• Chemical Ionisation CIChemical Ionisation CI• Fast atomic bombardment FAB (1981 M. Barber)Fast atomic bombardment FAB (1981 M. Barber)• Matrix-assisted laser desorption ionisation Matrix-assisted laser desorption ionisation

MALDI (1988 K. Tanaka, M. Karas F. Hillenkamp)MALDI (1988 K. Tanaka, M. Karas F. Hillenkamp)• Electrospray ES (1985, J. Fenn)Electrospray ES (1985, J. Fenn)

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 13: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

‘‘Soft’ Ionisation TechniquesSoft’ Ionisation Techniques

‘‘Soft’ refers to the low amount of energy imparted into the Soft’ refers to the low amount of energy imparted into the analyte during ionisation. Too much internal energy will analyte during ionisation. Too much internal energy will result in fragmentation. Soft ionisation techniques form result in fragmentation. Soft ionisation techniques form intact molecular or pseudo-molecular (M+H) ions.intact molecular or pseudo-molecular (M+H) ions.

Matrix-assisted laser desorption Matrix-assisted laser desorption ionisation (MALDI)ionisation (MALDI)

Electrospray (ES)Electrospray (ES)

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 14: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

““...for their developments of soft desorption ionisation methods for ...for their developments of soft desorption ionisation methods for mass spectrometric analysis of biological macromolecules”.mass spectrometric analysis of biological macromolecules”.

Nobel Prize in Chemistry 2002Nobel Prize in Chemistry 2002

11//2 of the prize went to Kurt Wutrich (Switzerland) development of NMR analysis//2 of the prize went to Kurt Wutrich (Switzerland) development of NMR analysis

1/4 to 1/4 to John B. FennJohn B. Fenn (USA) (USA)

Virginia Commonwealth UniversityVirginia Commonwealth University

Electrospray IonizationElectrospray Ionization

1/4 to 1/4 to Koichi TanakaKoichi Tanaka (Japan) (Japan)

Shimadzu Corp. KyotoShimadzu Corp. Kyoto

Laser IonizationLaser Ionization

Brancia FL , Trieste, 12/02/2004Brancia FL , Trieste, 12/02/2004

Page 15: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Electrospray (ES)Electrospray (ES)

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 16: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

[M+nH][M+nH]n+n+

Droplet shrinks Droplet shrinks due to solvent due to solvent evaporationevaporation

Droplet explodes due Droplet explodes due to charge density limitto charge density limit

Gaseous ions formed via Gaseous ions formed via one of two proposed one of two proposed

mechanismsmechanisms

samplesolution

mass analyzer

high vacuum

+HV

pressure gradient

potential gradient

counter electrode(near ground)

electrospraycapillary

skimmerelectrodes

atmospheric pressure

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 17: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

The principal outcome of the electrospray process is the The principal outcome of the electrospray process is the transfer of transfer of analyte species, generally ionised in condensed phase, into the gas phase analyte species, generally ionised in condensed phase, into the gas phase

as isolated entitiesas isolated entities

+HV+HV

+ + + + + + ++

+++ + ++

+ Aerosol of Aerosol of charged dropletscharged droplets

Gaskell SJ Gaskell SJ Jounal of Mass SpectrometryJounal of Mass Spectrometry 1997 1997 Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 18: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

ES spectrum of Rho proteinES spectrum of Rho protein

600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300m/z0

100

%

771.6759.3

759.1

747.1

735.5

724.1

713.2

713.0

702.6

702.4

682.1

682.0

672.4

672.2653.9

784.4

797.7

825.6

840.3

855.6

871.7

888.0

905.0

941.0

960.2980.3

1001.2

[M+56H]56+

[M+50H]50+

Rho Protein: 47004.33 DaRho Protein: 47004.33 Da

Courtesy of Dr Matt OpenshawCourtesy of Dr Matt Openshaw Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 19: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Electrospray (ES)Electrospray (ES)

[M+56H][M+56H]56+ 56+ = = 840.3 m/z840.3 m/z

Therefore, Therefore, M M = = [840.3 x 56] – 56[840.3 x 56] – 56== 47000.8 Da47000.8 Da

Deconvolution: Takes all the multiply charged ions and converts them into a Deconvolution: Takes all the multiply charged ions and converts them into a spectrum on a mass (Da) scale i.e. works out the molecular weight is most likely to spectrum on a mass (Da) scale i.e. works out the molecular weight is most likely to

be. be.

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 20: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

ES spectrum after deconvolutionES spectrum after deconvolution

44000 44500 45000 45500 46000 46500 47000 47500 48000 48500 49000 49500 50000mass0

100

%

47004.9

47004.0 Da47004.0 Da

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 21: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

AdvantagesAdvantages

• Production of molecular ions from solutionProduction of molecular ions from solution

• The ease of coupling with separation The ease of coupling with separation techniques (micro LC-MS/MSMS, nano LC-techniques (micro LC-MS/MSMS, nano LC-MS/MSMS)MS/MSMS)

• Production of multiply charged ionsProduction of multiply charged ions

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 22: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Matrix Assisted Laser Desorption Matrix Assisted Laser Desorption IonisationIonisation

MALDIMALDI

Time-of-FlightTime-of-Flight

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 23: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Matrix assisted laser desorption ionisation Matrix assisted laser desorption ionisation (MALDI)(MALDI)

COOH

OH

OH

OH

COOH

CN

OH

COOHH3CO

OCH3

-cyano-4-hydroxy cinnamic acid (CHCA)

2,5-dihydroxybenzoic acid (DHB)

Trans-3,5-dimethoxy-4- hydroxy cinnamic acid

(sinapinic acid; SA)

Typically used with a nitrogen laser (337 nm)

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 24: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

MALDI is an efficient desorption ionisation technique for MALDI is an efficient desorption ionisation technique for producing gaseous ions from a solid sample by laser producing gaseous ions from a solid sample by laser

pulsespulses

[M+H]+ Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 25: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Matrix Assisted Laser Matrix Assisted Laser Desorption/Ionisation (MALDI)Desorption/Ionisation (MALDI)

Unlike ES, MALDI forms predominantly singly charged ions e.g. [M+H]Unlike ES, MALDI forms predominantly singly charged ions e.g. [M+H]++ or adducts or adducts (sodium [M+Na](sodium [M+Na]++ or potassium [M+K] or potassium [M+K]++))

Sodium = 23 amuSodium = 23 amuPotassium = 39 amuPotassium = 39 amu

[M+H][M+H]++

22 m/z

38 m/z

[M+Na][M+Na]++

[M+K][M+K]++

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 26: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Why is the matrix so important?Why is the matrix so important?

• Matrix is necessary to dilute and disperse the Matrix is necessary to dilute and disperse the analyteanalyte

• It functions as energy mediator for ionising It functions as energy mediator for ionising the analyte itself or other neutral moleculethe analyte itself or other neutral molecule

• It forms an activated state produced by photo It forms an activated state produced by photo ionisationionisation

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 27: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

AdvantagesAdvantages

• MALDI primarily creates singly charged ions MALDI primarily creates singly charged ions [M+H][M+H]++

• Less sensitive to contaminantsLess sensitive to contaminants• Sensitivity at femtomole levelSensitivity at femtomole level• High throughput analysisHigh throughput analysis

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 28: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Time-of-flight (ToF) mass Time-of-flight (ToF) mass spectrometerspectrometer

Flight tube (field-free region)Flight tube (field-free region)

Extraction gridExtraction grid

MALDI targetMALDI target

DetectorDetector

t = 0t = 0 t = > 0t = > 0

mvmv22/2= zV/2= zVtt22=m/z(d=m/z(d22/2V)/2V)

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 29: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Reflectron-time of flight mass analyserReflectron-time of flight mass analyser

VACCEL

Electrostaticmirror

Detector 1

Detector 2Target

Laser

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 30: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Sensitivity = femtomole 10-15 M/l (...attomole 10-18 M)

Simplicity = very easy training required

$$$ = 70 to 650 k$ 120 to 650 k$

Speed (“high throughput”) = ~104/day dynamic system

Structural information = MSn MSn

Software = “ ...evaluation in progress.”

MALDI ESI

Selectivity (“resolution”) = >5000

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 31: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Structural information can be achieved by Structural information can be achieved by tandem mass spectrometrytandem mass spectrometry

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 32: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

The tandem mass spectrometry The tandem mass spectrometry experimentexperiment

Ion source

e.g. electrospray

Analyser 1

e.g. quadrupole

Decompositionregion

collisionally activateddecomposition CAD

Analyser 2

e.g. quadrupole,time-of-flight

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 33: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

*

**

**

m+f1+

f2+

f3+

f4+

m+

f1+ f

2+

f3+ f

4+

f1+

m+ f1+

MS1 MS2Collision Cell

Collision gasmolecules

ion source ion beam

m

f1

f2

f3

f4

m/z

TIC

(a)

iondetector

*

(b)

m

f1

f3

m/z

TIC

f3+

Brancia FL, Trieste, 12/02/2004Brancia FL, Trieste, 12/02/2004

Page 34: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

PROBLEMS WITH ‘CLASSICAL’ PROTEOME ANALYSIS:

1. Not comprehensive

2. Not high-throughput

3. Destroys protein-protein interactionsthat provide important clues to function

Page 35: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

0

50

100

150

200

250

300

350

400

450

1000 1200 1400 1600 1800 2000

Peptide mass (Da)

Number of (protein) database matches

C. elegans

S.cerevisiae

E.coliH.influenzae

Page 36: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

• Multidimensional protein identification technology (MudPIT)

• Washburn MP, et al Nat Biotechnol 2001, 19:242-247.

Reverse PhaseSCX

Load complete digest of sample

MS/MS

Develop with gradient and spray directly onto MSMS

Identified 1500 proteins from yeast including lower abundance species and membrane proteins

2415 (46%) of Plasmodium genome identified in all 4 stages of parasitic life cycle

Page 37: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Just Enough Diagnostic Information

Page 38: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Sidhu KS, Sangavich P, Brancia FL, Sullivan AG, Gaskell SJ, Wolkenhauer O,

Oliver SG, Hubbard SJ (2001) Bioinformatic assessment of mass spectrometric chemical derivatisation techniques for proteome

database searching. Proteomics 1, 1368-1377.

Page 39: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Provide limited sequence information by:

1. Identification of N-terminal amino acid byPTC derivatisation

2. Use guanidination to identify C-terminus,determine lysine content, and improve signal response

3. Specifically fragment next to Asp residues using MALDI-QToF MS

Page 40: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

PTC-derivatisationPTC-derivatisation•phenylthiocarbamoyl derivative•Edman chemistry•N-terminal amino acid •b1 ion created via low energy collisions•precursor ion scan gives parents•increased sensitivity

peptide ions ms1 ms2

fixed on b1scan for

precursors collisioncell

Spectra collected of all peptides whichgive rise to a given b1 ion (implying

knowledge of the N-terminal amino acid)

Page 41: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Database peptide hits by N-terminal amino acid

Error = ± 0.5 Da N-terminalAmino acid

mean numberof peptides

ANY 74.15W 1.70C 1.77H 2.30M

3.41:N 5.61I 5.76E 6.04S 7.18L

8.39:I/L 14.16

Average number of matching proteins in the yeast proteome when searching with a peptide mass in the 1000-2000 Da range

Rare amino acids give a bigger search gain

Page 42: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Guanidation of Lysine

NH2

NH2

O

OH

OH

ONH2

NH

NHH2

N

O NH2

NH2

H3C

lysine homoarginine

O-methyl isourea

Page 43: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

MALDI spectrum of an enolase tryptic digest

0

500

1000

1500

2000

1000 1500 2000 2500

Mass (m/z)

R R

R

R

R

R

KK K

Page 44: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

MALDI spectrum of a tryptic digest of enolase after guanidation

0

2000

4000

6000

800 1000 1200 1400 1600 1800 2000 2200 2400 2600

Mass (m/z)

*K

800 1000 1200 1400 1600 1800 2000 2200 2400 2600

Mass (m/z)

*K

*K*K

*K *K

RRR*K

*K

*K

*K

R

R

Page 45: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Initial set of search peptides and associated

information

Initial set of search peptides and associated

information

Search database, compile protein “hit list” with matching

peptides

Search database, compile protein “hit list” with matching

peptides

Top-scoring protein is matched. Remove

corresponding peptides from search list

Top-scoring protein is matched. Remove

corresponding peptides from search list

If all initial search peptides masses are matched, stop, else continue searching

Page 46: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Real yeast proteomics

• Alternatives to 2D-gels – denaturing technology– low abundance spots difficult to identify

• Many steps of orthogonal 1D-steps– Size exclusion chromatography– Ion exchange chromatography– 1D-gels

Page 47: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

3612

.77

1752

.62

795.

23

925.

33

1040

.30

1150

.49

1210

.39

1416

.55

1512

.69

1752

.65

0

800 1000 1200 1400 1600 1800

Mass (m/z)

795.

3281

1.32

3600

3570

.36

1470

.68

1708

.61

1768

.59

RK

R

K

800 1000 1200 1400 1600 3600

Before guanidination

After guanidination12

21.9

0

Yeast proteome sample

Page 48: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Database search gainsStandard MALDI7 search peptides(before guanidination)

1656 proteins match at least 1 peptide

2549 proteins match at least 1 peptide

Standard MALDI12 search peptides(after guanidination)

3235 proteins match at least 1 peptide

Combined 19 (7 + 12) search peptides(both experiments)

Page 49: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Database search gains

Search peptides in common(5 from expt 1, 4 from expt 2)

Search peptides in common(5 from expt 1, 4 from expt 2)

PTC derivatised 3 peptides N-term = Ile/Leu

PTC derivatised 3 peptides N-term = Ile/Leu

All 3 sets of experimental data combined

All 3 sets of experimental data combined

Only 289 proteins match at least 1 peptide in both experiments

Only 18 proteins match at least 1 peptide in all 3 experiments

Only 204 proteins match at least 1 peptide

# peptides in common

Page 50: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester
Page 51: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester
Page 52: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Yeast 1 protein

0

10

20

30

40

50

60

70

80

90

100

1 2 4

total number of search peptides

% unambiguous identification

standard

guanidination

PTC (500)

PTC (50)

Asp-frag

Asp-frag (All)

Yeast 2 proteins

0

10

20

30

40

50

60

70

80

90

100

2 4 6

total number of search peptides

% unambiguous identification

standard

guanidination

PTC (500)

PTC (50)

Asp-frag

Asp-frag (All)

C. elegans 1 protein

0

10

20

30

40

50

60

70

80

90

100

1 2 4

total number of search peptides

% unambiguous identification

standard

guanidination

PTC (500)

PTC (50)

Asp-frag

Asp-frag (All)

C. elegans 2 proteins

0

10

20

30

40

50

60

70

80

90

100

2 4 6

total number of search peptides

% unambiguous identification

standard

guanidination

PTC (500)

PTC (50)

Asp-frag

Asp-frag (All)

S. cerevisiae 1 protein S. cerevisiae 2 proteins

Page 53: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

protein hit list(quantitative data)

searchengine

primary data(input masses)

Database:Database: - proteome - proteins - peptides

secondary data(experimental proteome data)

rule-basedsystem

protein information(qualitative data)

probability possibilitycombinedevidence

Improved bioinformatics approachesImproved bioinformatics approachesfor complex mixturesfor complex mixtures

Final Scores

Page 54: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Contextual information

pI (theoretical & experimental)

Molecular weight (oligomerisation state)

Subcellular localisation (known, predicted - PSORT)

Molecular environment (soluble, membrane, DNA-,

actin- associated.)

Post-translational modifications (known, putative, predicted)

Sequence motifs

Homology relationships

Non-native state digestions

Page 55: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Scoring systems

• Bayesian approach

– k is hypothesis that the sample protein is protein k,– D is mass spec fingerprint data, – I is background information, – P(k|DI) is posterior probability for k given D and I,– P(k|I) is prior probability of k given I,– P(D|I) is a normalisation constant

)|(

)|()|()|(

IDP

kIDPIkPDIkP =

Page 56: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

QUANTITATIVEPROTEOMICS

Page 57: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

DiGEDifference Gel Electrophoresis

• Ünlü M. et al (1997). Difference gel electrophoresis:a single gel method for detecting changes in cell extracts. Electrophoresis,18, 2071-2077

Page 58: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

label with cy3in dark 30mins @ 4OC

label with cy5in dark 30mins @ 4OC

quench un-reacted dye by adding 1mM lysinein dark 10mins @ 4OC

Sample 2 Sample 3

2D gel electrophoresis

Sample 1

label with cy2in dark 30mins @ 4OC

Difference Gel Electrophoresis

Page 59: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Cy3 Cy5

Cy3 +Cy5

no difference ●

presence / absence ● ●

up / down-regulation ●

Page 60: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

• In vivo labelling = Isotopes introduced during cell culture

Pro ConCheap Only works for microbes and

cell culture????Information rich Very complex samples

Have to deduce sequence before assigning pairs

Stable Isotope Labelling

m/z

N14 N15

Page 61: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Light mutant Light WTHeavy WT Heavy mutant

Growth of C.elegans on isotopically labelled E.coli

Krijsveld et al (2003) Nat. Biotech.

E.coli grown on 14N nitrogen source

E.coli grown on 15N

nitrogen source

Metabolic labelling of C.elegans

Also grew Drosophila on metabolically labelled yeast

Page 62: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

In vitro labelling - continued

I Isotopes introduced during proteolysis 18O – labelled water, C-termini

II Guanidinylation of lysine using isotopes of O-methyl isourea – lysine residues

III Dimethyl labelling – lysine residues

–Pro Con

•Cheap Complex peptide mixture

•Universal Small mass difference on MS

Page 63: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Biotin Affinity Tag

Cleavable Linker

Isotope Coded Linker 227 / 236 (9*13C) amu

SH- reactive group(Iodoacetamide)

ICAT – Isotope Coded Affinity Tags

Pros Cons

Universal Protein must contain cysteineSimplified sample

Gygi SP, et al . Nat Biotechnol 1999, 17:994-999.

Page 64: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

ICAT methodICAT method

HN NH

S

O

O

NH

X

X

X

XO

OO

H

H

H

H

NH

O

Biotin Linker (heavy or light) Thiol-specific reactive group

Gygi S, Rist B et al. (1999) Nature Biotech. 17: 994.

Page 65: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Control sample Test sample

SH

SH

SH

SH

SH

SHSH

SH

SH

SH

SH

SH

S

S

S

S

S

S

S

S

S

S

S

S

Denature (SDS) and reduce (TCEP)

S

S

S

S

S

S S

S

S

S

S

S

Label with light reagent

Label with heavy reagent

Pool Samples

Page 66: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

SS

SS

S

SS

SS

S

S

S

S

SS

S

S

S

S

S

SS

S

S

SS

SS

S

SS

SS

S

LC-MSMS

S

S

S

S

S

S

S

S

S

S

S

S

Digest overnight with trypsin

Purify labelled peptides using avidin column

Cleave biotin portion of the tag with concentrated TFA

Page 67: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester
Page 68: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

iTRAQ

Page 69: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Ross P. et al. Mol Cell Proteomics. 2004 Sep 22

Page 70: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

WORKFLOWWORKFLOW

reduce, alkylate (cysteine block) and digest protein sample with trypsin as usual

label each sample (max of 4) with a different iTRAQ reagent, 100ug of protein is optimal

combine all iTRAQ labeled samples to one sample mixture

clean up sample by Cation- Exchange- Chromatography

for complex sample mixtures, pre-fractionation is achieved by using a High-Resolution-Cation-Exchange column

analyze the mixture by LC/MS/MS

results are analysed by Pro Quant Software

Page 71: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester
Page 72: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

PROTEIN TURNOVER

The missing dimension of proteomics

JM Pratt, J Petty, I Riba-Garcia, DHL Robertson, SJ Gaskell, SG Oliver, RJ Beynon (2002)

Molec. Cell. Proteomics 1, 579-591.

Page 73: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80

Doubling times (0.1h-1)

Time (h)

Deuterated leucine labelling Unlabelled chase

Protein labelling curve

Loss of label from proteins at different rates = turnover

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80

Doubling times (0.1h-1)

Time (h)

Deuterated leucine labelling Unlabelled chase

Protein labelling curve

Loss of label from proteins at different rates = turnover

(100ml/h-1)

Dilution rate = 0.1h-1

Half-time = 6.9h

Experimental Approach

Page 74: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

1119.9

1119.8

1119.9

L=0

1336.2

1454.1

1467.3

1686.3 1795.

4

2057.5

2336.5

L=2L=3

L=3

L=2

L=1

L=1

L=2

1317.8

1440.0 1444.9

1668.0

1747.1

1768.22039.2

2327.2

L=1

L=1

L=2

L=3

100% d9

0% d9

50% d9

Pratt et al., Figure 3

Page 75: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

0h

4h

6h

8h

12h

25h

51h1520 1530 1540 1550

m/z0

100

%

0

100

%

0

100

%

0

100

%

0

100

%

0

100

%

0

100

%

1538.967

1521.909 1532.971 1554.837

1539.007

1529.9491521.946 1552.938

1538.981

1529.930

1521.941 1551.931

1538.991

1521.928 1552.943

1538.9871529.932

1520.933 1551.916

1539.029

1529.984

1523.882 1551.968

1530.129

1552.1301539.080

2100 2110 2120 2130m/z0

100

%

0

100

%

0

100

%

0

100

%

0

100

%

0

100

%

0

100

%

2126.389

2112.260

2126.443

2099.2402121.3052110.260

2099.2502126.419

2110.235 2121.241

2126.4202099.251

2122.2392112.228

2099.2472126.407

2109.259

2099.316

2121.2562111.252

2099.525

2122.5222110.462

9Da (1 Leu) 27Da (3 Leu)

Page 76: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

0 . 2

0 . 4

0 . 6

0 . 8

1

0 1 0 2 0 3 0 4 0 5 0 00

0 . 2

0 . 4

0 . 6

0 . 8

1

1 0 2 0 3 0 4 0 5 0 6 0Time(h) Time(h)

RIA

tR

IAt

NADP-glutamate dehydrogenase (GDH) (3 peptides)

Hsp26(2 peptides)

Hsp71 (4 peptides) Pyruvate decarboxylase (PDC)(4 peptides)

0.16

0.08

0NADP-GDH Hsp26 Hsp71 PDC

k lo

ss

(h-1)

± S

EM

Pratt et al., Figure 3

Page 77: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6 7 8 910 11 12 13 14 15 15 16 17 18 19 20 21 22 23 25 26 27 27 28 29 30 31 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Protein (Spot ID)

Deg

rada

tion

rate

con

stan

t (h-1

) ±

SE

MPratt et al., Figure 5

30

20

10

0

< 0.01h-1

0.01-0.02 h-1

0.02-0.03 h-1

0.03-0.04 h-1

> 0.04 h-1

Degradation rate constant

Dis

trib

uti

on

(%

)

Page 78: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

INTEGRATION

Page 79: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Evaluating protein-interactiondata

von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002)

Comparative assessment of large-scale data sets of protein–protein interactions.

Nature 417, 399-403.

Cornell M, Paton NW, Oliver SG (2004)A critical and integrated view of the yeast interactome.

Comp. Funct. Genom. 5, 382-402

Page 80: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

 

The fusion of the “bait” protein and the DNA binding domain of the transcriptional activator cannot turn on the Reporter Gene.

(A)

(B)

(C)

A DNA Binding Domain Fused to Protein A B

Activator region fused to Protein B

Transcription A A B B

Activator region fused to Protein B

UAS LacZ

Promoter

UAS LacZ UAS LacZ

Promoter

Transcription Transcription

UAS LacZ

Promoter

B

Activator region fused to Protein B

UAS LacZ

Promoter

UAS LacZ UAS LacZ

Promoter

B B

Activator region fused to Protein B

UAS LacZ

Promoter Reporter Gene

A DNA Binding Domain Fused to Protein A

UAS LacZ

Promoter

UAS LacZ UAS LacZ

Promoter

The fusion of the “prey” protein and the activating region of the transcriptional activator is also insufficient to switch on the reporter.

Reporter Gene

Reporter Gene

The association of “bait” and “prey” brings the DNA binding domain and the activator region close enough to switch on the Reporter Gene and turn yeast blue.

Fig. 1 How the two-hybrid system detects protein associations in yeast.

Page 81: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

UAS

DNA-binding D

reporter gene

activation D

AB

RNA POL II

Schematic representation of the two hybrid system in case of interaction of protein A and B

Gene expression

Page 82: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

UAS

DNA-binding D

reporter gene

activation D

A

B

RNA POL II

Schematic representation of the two hybrid system in absence of interaction of protein A and B

NO TRANSCRIPT

Page 83: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester
Page 84: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Synthetic lethals

Definition: lethality is caused by mutating two or more genes

gene2

gene3

gene4

Single essential pathway

gene5

gene1

gene2

gene3

gene4

gene5

gene1

geneA

geneB

geneC

Functionally overlapping pathways

Page 85: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Dolpp-GlcNAc2Man9Glc3

(Substrate)

Asparagine-linked Glycosylation

Asp-NH2

X

SER/THR

+ Asp -NH -GlcNAc2Man9Glc3

X

SER/THR

alg mutations are synthetically lethal withconditional mutation affecting oligosaccharyltransferase activity

STT3, OST1WBP1, OST3OST6, SWP1OST2OST5OST4

(ALG genes are responsible for the core synthesis)

Page 86: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Integrating complex data with yeast two-hybrid data

A

B C

DEF

Complex consists of six proteinsA, B, C, D, E, F

AIn a yeast two-hybrid experiment, A interacts with another protein

Is B, C, D, E or F?

Page 87: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Large-scale interaction data and the distribution of interactions according to functional categories.

Page 88: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Quantitative comparison of interaction datasets.

Page 89: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Set of confirmed Y2H interactions

Confirmation of an interaction requires:

1. Identification in more than one Y2H screen, OR2. The reverse interaction must have been identified,

OR3. The two proteins must have been identified in the

same protein complex (from either classical or high-throughput affinity purification studies).

A total of 451 reliable interactions, involving 581 proteins have been identified from a combined data set comprising 5214 interactions and 4025 proteins

Page 90: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

  

Page 91: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

PEDRo: A Systematic Approach to Modelling, Capturing and

Disseminating Proteomics Data Taylor CF, Paton NW, Garwood KL,

Kirby PD, Stead DA, Yin Z, Deutsch EW, Selway L, Walker J,

Riba–Garcia I, Mohammed S, Deery MJ, Howard JA,

Dunkley T, Aebersold R, Kell DB, Lilley KS, Roepstorff P,

Yates JR III, Brass A, Brown AJP, Cash P, Gaskell SJ, Hubbard SJ, Oliver SG (2003)

Nature Biotechnol. 21, 247-591.

Garwood K, McLaughlin T, Garwood C, Joens S, Morrison N, Taylor CF, Carroll K, Evans C, Whetton AD, Hart S, Stead D, Yin Z, Brown AJP, Hesketh A, Chater K, Hansson L, Mewissen M, Ghazal P, Howard J, Lilley KS,

Gaskell SJ, Brass A, Hubbard SJ, Oliver SG, Paton NW (2004)

PEDRo: A database for storing, searching and disseminating experimental proteomics data.

BMC Genomics 5, 68  doi:10.1186/1471-2164-5-68.

Page 92: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Proteomics — the state of play• The volume of generated proteome data is rapidly increasing

– Movement towards high–throughput approaches– Experimental techniques increasing in complexity– Analyses also increasing in complexity

• Current publicly available proteomics data is limited– 2D–Gel image databases (e.g. SWISS–2DPAGE) contain little information about sample

preparation, or analysis of results– No widely used databases of mass spectrometry data or analyses

• A robust, future-proofed, standard representation of both methods and data from proteomics experiments is required

– Analogous to the MIAME guidelines for transcriptomics– Users will know what to expect from datasets (formats etc.)– Will facilitate handling, exchange and dissemination of data– Will guide the development of effective search/analysis tools

Page 93: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

PEDRo and PEML• The PEDRo (Proteome Experiment Data Repository) model

– Specifies the information required about a proteomics experiment• sufficient information to exactly replicate that experiment

– Organised in a manner reflecting the procedures that generated it– Flexible enough to accommodate new technological developments– Described in UML (Universal Modelling Language) making it

implementation–independent (effectively a generic blueprint)• Implemented in SQL (the relational database repository)• Also implemented in Java (later slide), and XML (next bullet)

• PEML (Proteomics Experiment Markup Language)– The XML implementation of PEDRo for data exchange and rapid

dissemination (using XSLT to display PEML files as web pages)

• Two benefits arising from early implementation of the model– Implementation allows the underlying technologies to be tested– Making explicit what data might most usefully be captured about

proteomics experiments will speed the model’s evolution

Page 94: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

• Sample generation– Origin of sample

• hypothesis, organism, environment, preparation, paper citations

• Sample processing– Gels (1D/ 2D) and columns

• images, gel type and ranges, band/spot coordinates

• stationary and mobile phases, flow rate, temperature, fraction details

• Mass Spectrometry• machine type, ion source, voltages

• In Silico analysis• peak lists, database name + version,

partial sequence, search parameters, search hits, accession numbers

The nature of proteomics experiment data

Page 95: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

The PEDRo UML schema in reduced form

MALDI

Electrospray

ToF

Spot Gel2D

TreatedAnalyte ChemicalTreatment

DiG EGelItemBoundaryPoint

GelItemRelatedGelItem

Quadrupole

CollisionCell

IonTrap

Hexapole

Organism TaggingProcess

Band Gel1D

OtherIonisation

OntologyEntry

OthermzAnalysis

OtherAnalyte

OntologyEntry

OtherAnalyte ProcessingStep

Fraction

AssayDataPoint

ColumnGradientStep

MobilePhaseComponentPercentX

Detection

mzAnalysis

AnalyteProcessingStep

IonSource

Analyte

MassSpecMachine

Peak-Specific

ChromatogramIntegration

Chromatogram

Point

ListProcessing

MSMSFraction

MassSpecExperiment

Peak

PeakList

TandemSequenceData

DBSearchParameters

RelatedGelItem

Protein

DBSearch

OntologyEntryProteinHit

PeptideHit

DiG EGel

Gel

Experiment

SampleOrigin

S

Page 96: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

MALDI

l aser_wavel engthl aser_powermatrix_ ty pegri d_vo ltage

accel eration _vo ltagei on_ mode

Electrospray

spray _ti p_v oltagespray _ti p_diametersoluti on _vol tage

con e_v oltageloading_ty pesolven tin terface_man uf actu rer

spray _ti p_man uf actu rer

ToF

ref lectron _statei n ternal _len gth

Spot

apparen t_ piapparen t_ mass Gel 2D

pi_start

pi_endmass_startmass_en df irst_ dim_detail s

sec on d_di m_ detai ls

* 1

TreatedAn aly te

Chemi calT reatmen t

digesti onderivatisation s1 1

DiGEGelI te m

dy e_ty pe

Bou nd aryPoi n t

pixel _x_c oordpixel _y_c oord

Gel Item

idareain ten sityloca l_backgroun d

ann otationann otation _sou rc evolu mepixe l_x _coord

pixe l_y _coordpixe l_radiusn ormalisati onn ormalised_vo lume

1

1

*

Rela tedGelItem

descriptionge l_referen ceitem_ referen ce

1

*

Qu adru pol e

descri ption

Co lli si onCell

gas_typegas_pressu recol lision_o ff set

IonTrap

gas_ ty pegas_ pressu rerf_ frequ ency

ex citation _amplitu deisol ation _centreisol ation _wi dthf in al_ms_level

Hexapol e

descri ption

Experiment

h yp othesis

meth od_ citation sresu lt_ci ta tions

Sample

sample_ id

sample_ dateexperi menter

*1

Organ ism

speci es_n amestrai n _i dentifi erre lev an t_gen oty pe

Sample Origi n

descripti oncon dit ioncon dit ion _degreeenv iron men t

tissue_ty pecel l_ty pecel l_cyc le_ ph asecel l_component

tech ni qu emetabo lic_ label

1 .. n

*

* 1

Taggi n gP roc ess

ly sis_ bu ffer

tag_ ty petag_ purityprotei n _c oncen trationtag_ con cen tration

f in al _vo lu me

* 0 .. 1

Ban d

lane_ n um berapparen t_ mass

Gel 1D

den aturin g_ agen tmass_startmass_en dru n_detail s

* 1

Detection

type

Oth erI on isation

n ame

On tol ogy Entry

categoryva lu edescri ption

*

1

*

1

i on isa tio n_

pa ra meters

Oth ermzAn aly si s

n ame

*

1

*

1

mz_a nal ysi s

_pa rameters

Oth erAn aly te

n ame

On tology Entry

categoryval u e

descri ption

*

1

an aly te _pa ra metersOth erAn aly teProcessin gStep

n ame*

1

*

1

a nal yte_p ro cessi ng

_step _para meters

Fraction

start_poin t

end_po intprotein _assay

Assay DataPoi nt

t imeprotei n _assay

Colu mn

descripti onman ufactu rerpart_n um ber

batch _n um berin terna l_l en gthin terna l_diameterstati on ary _ph ase

bead_sizepore_ si zetemperature

fl ow_ratein jection_ volu meparameters_f ile* 1

0 .. 1

1

*1{o rdered }

Gradien tStep

step_time1*{o rdered }

MobilePh ase

Compon en t

descriptioncon cen tration

* 1

PercentX

percentage2 .. n

1

11

mzAnal ysis

type

0 .. 10 .. 1

0 .. 1

1

AnalyteProcessingStep

IonSource

typeco lli si on_ energy

1

0 .. 1

Analyte

*

1

MassSpecMach in e

man ufactu rermodel_ n amesoftware_version

1

Peak Sp ec ificChromatogramIn tegration

resol ution

sof tware versionbackgroun d_th resh ol darea_un der_ curvepeak_ description

sister_peak _re ference

ChromatogramPoi n t

ti me_po in t

ion_ cou nt

L istProcessin g

smooth ing_process

bac kgroun d_th resh old

MSMSF raction

target_ m_to_z

plus_ or_mi n us

MassSpecExperimen t

descripti onparameters_f ile

*

1

*1

Peak

m_to_z

abu nd anc emultiplici ty

1

*

1

0 .. 1*

ha s_chi l dren

1*

1*

1

1 .. n

*

1

Tan demSequen ceData

sou rce_ ty pesequence

DBSearch Parametersprogram

databasedatabase_dateparameters_ fil e

taxon omica l_f il te rf ixed_modi fi cation svariable_modi fi cation smax_missed_cl eavages

mass_val u e_ ty pef ragment_i on_ to leran cepeptide_mass_tol eranceac curate_ mass_mode

mass_error_ty pemass_errorproton atedicat_opti on

RelatedGelItem

descri ptionge l_referen ceitem_ referen ce

Protein

ac cession _nu mbergen e_ n amesynon ym s

organ ismorf _nu mberdescri ptionsequence

modi fi cationspredicted_masspredicted_pi

DBSearch

u sernamei d_daten-termin al_aac- termi n al _aa

cou nt_ of _spec if ic_aan ame_of_ coun ted_ aaregex_pattern

*1

1*

1

*

On tol ogy Entry

categoryva lu edescription

*1

db _s ea rch_p ara me ters

Protein Hi t

a ll_pepti des_match ed1

1

*

Pepti deHit

score

s co re _types eq ue nce

1* {o rdered }

*

1 .. n

p ep ti de_hit_pa rameters

1 .. n

DiGEGel

dy e_ty pe

excitation _wave len gthexposure_timetif f _image

Gel

descriptionraw_ imageann otated_ imagesof tware_ version

warped_imagewarping_mapequ ipmentpercent_ acrylamide

sol ubi lizati on_ bu fferstain _detail sprote in _assayin-gel_ digesti on

backgroun dpixel _size_ xpixel _size_ y

1

{o rde red}

Sam pl e Gener ation Sam pl e Proce ssing

Mas s Spectro met ry MS Re sults A n aly sis

PE D Ro UML Class Diag r a m : K ey to col ou rs

Page 97: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

The Framework Around PEDRo

1. Lab generated data is encoded using the PEDRo data entry tool, producing an XML (PEML) file for local storage, or submission

2. Locally stored PEML files may be viewed in a web browser (with XSLT), allowing web pages to be quickly generated from datasets

3. Upon receipt of a PEML file at the repository site, a validation tool checks the file before entering it into the database

4. The repository (a relational database) holds submitted data, allowing various analyses to be performed, or data to be extracted as a PEML file or another format

Page 98: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

The PEDRo Data Collator• The tool with which a user enters information about, and data from, proteomics experiments

–The tool collates these data into a single PEML file

–The hierarchical nature of the PEDRo schema (and PEML) is reflected in the structure of the data entry tool

• Successive stages of the experimental design are added as ‘children’ of the previous stage

• Enforces an audit trail for data; e.g. details of a gel cannot be entered without first describing the sample

• A simple, filterable list of all the sub–records present and tree-style browser act as ‘index’ and ‘contents’ for the PEML file being edited

Page 99: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester
Page 100: Experimental & Bioinformatic Tools for Proteomics Steve Oliver Professor of Genomics Faculty of Life Sciences The University of Manchester

Conclusions• The PEDRo model does require a substantial amount of data

– Much of this information will be available in the lab of origin– Some data will be common to many experiments, and therefore need only be

entered once, then saved as a template in PEDRoDC

• But there are several advantages to adopting such a model– All datasets will contain information sufficient to quickly establish the

provenance and relevance (to the researcher) of a dataset– Datasets will be detailed enough to allow non–standard searches, for

example, by sample extraction technique– Tools can be developed that allow easy access to large numbers of

such datasets, from a wide range of proteomics sites– Integration with other resources such as the major sequence

databases, will provide sophisticated search and analysis capability– Information exchange between researchers will be facilitated through

the use of a common language (PEML), and the ability to rapidly display PEML-encoded data as a web page