mapping the sub-cellular proteome · 2015-12-04 · spatial proteomics - why? mis-localisation...

43
Mapping the sub-cellular proteome Computational analyses of high-throughput mass spectrometry-based spatial proteomics data Laurent Gatto [email protected] @lgatt0 Computational Proteomics Unit http://cpu.sysbiol.cam.ac.uk/ http://lgatto.github.io/ (Slides @ http://goo.gl/SZRMjg) 14 Oct 2015, CCBI

Upload: others

Post on 05-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Mapping the sub-cellular proteome

Computational analyses of high-throughput massspectrometry-based spatial proteomics data

Laurent [email protected] – @lgatt0

Computational Proteomics Unithttp://cpu.sysbiol.cam.ac.uk/

http://lgatto.github.io/

(Slides @ http://goo.gl/SZRMjg)

14 Oct 2015, CCBI

Page 2: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Page 3: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Regulations

Page 4: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Cell organisation

Spatial proteomics is the systematic study of protein localisations.

Image from Wikipedia http://en.wikipedia.org/wiki/Cell_(biology).

Page 5: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Spatial proteomics - Why?

Mis-localisationDisruption of the targeting/trafficking process alters propersub-cellular localisation, which in turn perturb the cellularfunctions of the proteins.

I Abnormal protein localisation leading to the loss of functionaleffects in diseases (Laurila and Vihinen, 2009).

I Disruption of the nuclear/cytoplasmic transport (nuclearpores) have been detected in many types of carcinoma cells(Kau et al., 2004).

Re-localisation in

I Differentiation: Tfe3 in mouse ESC (Betschinger et al., 2013).

I Metabolism: changes in carbon sources, elemental limitations.

Page 6: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Page 7: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010)

Page 8: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Fusion proteins and immunofluorescence

Page 9: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Fusion proteins and immunofluorescence

Figure : Example of discrepancies between IF and FPs as well as betweenFP tagging at the N and C termini (Stadler et al., 2013).

Page 10: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010). Gradientapproaches: Dunkley et al. (2006), Foster et al. (2006).

⇒ Explorative/discovery approches, global localisation maps.

Page 11: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

Page 12: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Page 13: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Quantitation data and organelle markers

Fraction1 Fraction2 . . . Fractionm markers

p1 q1,1 q1,2 . . . q1, m unknownp2 q2,1 q2,2 . . . q2, m loc1

p3 q3,1 q3,2 . . . q3, m unknownp4 q4,1 q4,2 . . . q4, m loci...

......

......

...pj qj,1 qj,2 . . . qj, m unknown

Page 14: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Annotated data sets

I Several mouse E14TG2a Embryonic Stem cells.

I Human Embryonic Kidney fibroblast cells.

I The Arabidopsis AT CHLORO data base (Ferroet al., 2010).

I Mouse organs (Foster et al., 2006).

I Arabidopsis from callus (Dunkley et al., 2006;Nikolovksi et al. 2014) and roots (Groen et al.,2014).

I Drosophila embryos (Tan et al., 2009).

I Chicken DT40 Lymphocyte cell (Hall et al.,2009).

I . . .

Available in the pRolocdata experiment package.

0

500

1000

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Num

ber

of P

MID

s

Spatial/organelle(s) proteomics papers

Page 15: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Visualisation and classification

0.2

0.3

0.4

0.5

Correlation profile − ER

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

Correlation profile − Golgi

Fractions

1 2 4 5 7 81112

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − mit/plastid

Fractions

1 2 4 5 7 81112

0.15

0.20

0.25

0.30

0.35

Correlation profile − PM

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − Vacuole

Fractions

1 2 4 5 7 81112

●●

●●

●●

●●

●●●● ●

●●

●●

●●

−10 −5 0 5

−5

05

Principal component analysis

PC1

PC

2

ERGolgimit/plastidPM

vacuolemarkerPLS−DAunknown

Figure : From Gatto et al. (2010), Arabidopsis thaliana data fromDunkley et al. (2006)

Page 16: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Data analysis

Fraction1 Fraction2 . . . Fractionm

prot1 q1,1 q1,2 . . . q1, mprot2 q2,1 q2,2 . . . q2, mprot3 q3,1 q3,2 . . . q3, mprot4 q4,1 q4,2 . . . q4, m...

......

......

proti qi,1 qi,2 . . . qi, m...

......

......

protn qn,1 qn,2 . . . qn, m

markers. . . unknown . . .

organelle1unknownorganelle2

......

...organellek

......

.... . . unknown

Fraction1 Fraction2 . . . Fractionm

prot1 . . . . . . . . . . . .

proti...

......

...protn . . . . . . . . . . . .

−6 −4 −2 0 2 4 6

−4

−2

02

4

Principal Component Analysis Plot

PC1 (64.36%)

PC

2 (2

2.34

%)

●● ●●

●●

●●

●●

●●

●●●

●●

● ●●●●●

●● ● ●

●●

●●

●●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●●

●● ●●

●●

●●

●● ●

●●●

●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

Supervised machine learning

Using labelled marker proteins to match unlabelled proteins (ofunknown localisation) with similar profiles and classify them asresidents to the markers organelle class.

Page 17: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Current approaches - supervised ML

svm

sigma

cost

0.0625

0.125

0.25

0.5

1

2

4

8

16

0.01 0.1 1 10 100 1000

0.5

0.6

0.7

0.8

0.9

1.0

−6 −4 −2 0 2 4 6−

4−

20

24

Optimised parameters

PC1 (64.36%)

PC

2 (2

2.34

%)

−6 −4 −2 0 2 4 6

−4

−2

02

4

Wrong parameters

PC1 (64.36%)

PC

2 (2

2.34

%)

Figure : Support vector machines classifier with a radial basis functionkernel function, using the pRoloc Bioconductor package1 (Gatto et al.,2014).

1www.bioconductor.org/packages/release/bioc/html/pRoloc.html

Page 18: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

F1

0.5

0.6

0.7

0.8

0.9

1.0

knn nb nnet plsda rf svm

● ● ●●

● ●

● ●

Tan.PD

knn nb nnet plsda rf svm

● ● ●

●●●●●

●●

●●●●●●●●●●●●●

●●

●●●

Tan

knn nb nnet plsda rf svm

● ● ●●

● ●●●●

●●●●

● ●

●●●●

●●

●●●●●●●●●●●

●●

●●●●●

●●●●

●●●

Dunkley.PD

● ● ● ● ● ●●●●

●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●

Dunkley

●● ● ● ●

Andy.PD

0.5

0.6

0.7

0.8

0.9

1.0

● ●

● ●●

●●

●● ●

Andy

0.5

0.6

0.7

0.8

0.9

1.0

●●

●●

AT_CHLORO

● ● ● ●

● ●

●●●●●

●●●

Nikolovski

● ●●

● ●

●●●●●● ●●

Nikolovski.Imp

Figure : Comparing classifiers

Page 19: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Limitations

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

Incomplete annotation, and therefore lack of training data, formany/most organelles. Drosophila data from Tan et al. (2009).

Page 20: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Novelty detection

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Figure : Left: Drosophila data from Tan et al. (2009). Right:Semi-supervised learning, Breckels et al. (2013).

Page 21: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Input data:D = (DL, DU )

Phenotype modeling:Select Di

L and modelF = Di

L ∪ DU using aGMM (cluster numberestimate using BIC).

Get candidates: Mem-bers of DU clustered

with DiL are considered

candidats of class i.

Each candidate is testedagainst an outlier

detection algorithm.

Candidates classifiedas members of i are

merged with DiL. Those

which are rejectedare returned to DU

Update classes: ex-amples in DU that areconsistently accepted

into a single class i arelabelled as members of Di

L.

New phenotype: Anyexample of DU not merged

with any DiL and which

are consistenlty clusteredtogether throughoutthe N iterations areconsidered membersof a new phenotype.

Output: Returnunassigned examples,

new DiL members

and new phenotypes.

next class i

all classes considered

Repeat N times

Page 22: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Page 23: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, . . .

I From a user perspective: ”free/cheap” vs. expensive

I Abundant (all proteins, 100s of features) vs. (experimentally)limited/targeted (1000s of proteins, 6 – 20 of features)

I For localisation in system at hand: low vs. high quality

I Static vs. dynamic

number GO features � experimental fractions⇒ dilution of experimental data

Page 24: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, . . .

I From a user perspective: ”free/cheap” vs. expensive

I Abundant (all proteins, 100s of features) vs. (experimentally)limited/targeted (1000s of proteins, 6 – 20 of features)

I For localisation in system at hand: low vs. high quality

I Static vs. dynamic

number GO features � experimental fractions⇒ dilution of experimental data

Page 25: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

GoalSupport/complement the primary target domain (experimentaldata) with auxiliary data (annotation) features withoutcompromising the integrity of our primary data.

Updated experimental design for

I primary/experimental data

and

I auxiliary/annotation data

Learning from heterogeneous data sources: an application in spatial

proteomics. Breckels LM, Holden S, Wonjar D, Mulvey CM, Christoforou

A, Groen AJ, Kohlbacher O, Lilley KS and Gatto L.

bioRχiv pre-print http://dx.doi.org/10.1101/022152.

Page 26: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

Database query

Extract GO CC terms

Convert terms to binary

PR

IMA

RY EX

PER

IMEN

TAL

DATA

AU

XIL

IARY D

RY D

ATA

O00767P51648Q2TAA5Q9UKV5......

GO:0016021 GO:0005789 GO:0005783 ... ... ...

1 1 1 ... ... ...1 1 0 ... ... ...1 1 0 ... ... ...0 0 0 ... ... .... . .. . .. . .. . .. . .. . .

x1

.

.

.

.

.

.

.

.xn

GO1 ... ... ... ... GOA

O00767P51648Q2TAA5Q9UKV5......

0.1361 0.150 0.1062 0.147 0.277 0.1429 0.0380 0.003380.1914 0.205 0.0566 0.165 0.237 0.0996 0.0180 0.027270.1297 0.201 0.0546 0.146 0.292 0.1463 0.0206 0.009020.0939 0.207 0.0419 0.204 0.344 0.1098 0.0000 0.00000. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .

x1

.

.

.

.

.

.

.

.xn

X113 X114 X115 X116 X117 X118 X119 X121

Visualisation Visualisation

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

Page 27: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

−2 0 2 4

−2

−1

01

23

4

PC1 (40.28%)

PC

2 (2

5.7%

)

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

Data from mouse stem cells (E14TG2a)

We use a class-weighted kNNtransfer learning algorithm tocombine primary and auxiliarydata, based on Wu andDietterich (2004):

V (ci )j = θ∗nPij + (1− θ∗)nA

ij

Page 28: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

Page 29: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

1

2

c1c2c3

NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

Page 30: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

Weights matrix (labelled)

c1 c2 c3

θ1 0 0 0θ2 0 0 1

θi...

...... 1 1 0θΘl 1 1 1

F11

F12

F1i...

F1Θl

θ∗ = {1, 0, 1}

Page 31: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Classes and weightsC = {ci=1, . . . , ci=l}; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;

y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;

y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,l

nP2,1 . . . nP

2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,l

nA2,1 . . . nA

2,l

.

.

.

.

.

.

Class-weighted classifier(unlabelled)

V (ci )j = θ∗nPij + (1− θ∗)nA

ij

ci=1 . . . ci=l

123 V (ci )j...j

yj = argmax(V (ci )j )

Page 32: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

θ∗ = {1, 0, 1}

NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

V (c1)1 =1 ×3

3+ (1 − 1) × nA

1,1

V (c2)1 =0 × 0 + (1 − 0) × nA1,2

V (c3)1 =1 × 0 + (1 − 1) × nA1,3

V (c1)2 =1 ×1

3+ (1 − 1) × nA

1,1

V (c2)2 =0 ×2

3+ (1 − 0) × nA

1,2

V (c3)2 =1 × 0 + (1 − 1) × nA1,3

Class-weighted classifier(unlabelled)

V (ci )j = θ∗nPij + (1− θ∗)nA

ijc1 c2 c3

1 V (c1)1 V (c2)1 V (c3)1

2 V (c1)2 V (c2)2 V (c3)2...

...j

yj = argmax(V (ci )j )

Page 33: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

D                                              E                        

A                    B                                    C  

● ●●

● ●● ●●●●●

● ●●●●●●●

●●

●●

●●

●●●

●●

40S Ribosome 60S Ribosome Cytosol Endoplasmic reticulum

Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus

Plasma membrane Proteasome

0.4

0.6

0.8

1.0

0.6

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.75

0.80

0.85

0.90

0.95

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary

F1 s

core

−6 −4 −2 0

−6−4

−20

2

PC1 (3.43%)

PC2

(2.0

8%)

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●

● ●

●●

●●●

●●

●●●●●●●●●●●●●●

●●●

●●

●●●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●● ●●●

●●

●●●

●●

●●

●●●

●●●

●● ●

●●

●●

● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

−2 0 2 4

−2−1

01

23

4

PC1 (40.28%)

PC2

(25.

7%)

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown ●

0.5

0.6

0.7

0.8

0.9

Combined Primary Auxiliary

F1 s

core

Proteasome

Plasma membrane

Nucleus − Nucleolus

Nucleus − Chromatin

Mitochondrion

Lysosome

Endoplasmic reticulum

Cytosol

60S Ribosome

40S Ribosome

0 1/3 2/3 1Classifier weight

Cla

ss

Data from mouse stem cells (E14TG2a).

Page 34: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

From SML to transfer learning: learn from heterogeneous datasources (experimental spatial proteomics and GO annotation,sequence features, imaging data) to infer localisation more reliably(Breckels et al. 2015).

0.25

0.50

0.75

1.00

knn knn−TL svm svm−TL

Sco

res outcome

correct

incorrect

Page 35: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Plan

Introduction

Spatial proteomics

Data analysis

Transfer learning

Dynamics

Page 36: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Dual-localisation Proteins may be present simultaneously inseveral organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Page 37: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Dual-localisation Proteins may be present simultaneously inseveral organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Page 38: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Spatial dynamics

Trans-localisation Changes in localisation upon perturbations.

−4 −2 0 2

−4

−3

−2

−1

01

23

PC1 (43.43%)

PC

2 (3

9.04

%)

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

● ●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

cytoplasmERGolgiMitochondrialNucleiPlasma membraneProteasome & RibosomeVacuoleunknown

Condition 1

−4 −2 0 2 4

−3

−2

−1

01

23

PC1 (39.04%)

PC

2 (3

0.9%

)

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●●

● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●●

● ●

●● ●

●●

●●

Condition 2

Page 39: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Spatial dynamics

d1 = dist(profilerep1condition1

, profilerep1condition2

)

d2 = dist(profilerep2condition1

, profilerep2condition2

)

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●●●

● ●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

● ●

● ●●

●●

●●

●●●

●●

● ●●

●●

●●

● ●●

●●

● ●

●●

●● ●● ●

●●

●●

● ●

●●●●

●●

● ● ●

●●

●●

●● ●

●●

●●

● ●

●●

●●● ●

● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ● ● ●●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

● ●● ●

●●

●●

●●

●●

0.0 0.5 1.0 1.5

−3

−2

−1

01

23

(d1 + d2)/2

log2

(d1/

d2)

−4 −2 0 2

−4

−3

−2

−1

01

23

PC1 (43.43%)

PC

2 (3

9.04

%)

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Condition 1

cytoplasmERGolgiMitochondrialNucleiPlasma membraneProteasome & RibosomeVacuoleunknown

●●●

●●

12

3

4

5

−4 −2 0 2 4

−3

−2

−1

01

23

PC1 (39.04%)

PC

2 (3

0.9%

)●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

Condition 2

●●●●●12345

Page 40: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Beyond organelles: application to PPI/Protein complexes

−10 −5 0 5 10

−5

05

10

markers

PC1 (47.02%)

PC

2 (2

2.25

%) ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●●●

●●●

●●

●● ●

● ●● ●

●●●●

●●●

●●●

●●●●

●●●

●●●

●●●

●●

● ●

●●●●

●●●

●●

●●

● ●●

●●

●●

●● ●●●

●●●

●●●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●●●●

●●

●●●●●

●●

●●

● ●

●●

● ●●

● ●

● ●

●●

● ●

●●●

●●●●

● ●●●

●●

●●

● ●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●● ●

● ● ●●●

●● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●●

●●

●●

● ●●●

●●

● ●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●● ●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●● ● ●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●● ●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●●

●●● ●

●●

●●●

●●●

●●●

●● ●

●●

●●

●●●

●●●

●●●

● ●

●●●●

● ●●

●●●●●●●●

●●

●●

●●

●●

●●●●

●●●●

●●● ●●

●●●●

●●

●●●

●●●

●● ●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●●●

●●●

●●

●●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●● ●

●●

●●●●

●● ●●●●●

● ●●●●●

●●

●●

●●●

●●●●●●●●●●●●●●●

●●●●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●● ●●

●●

● ●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●● ●● ●● ●●● ●●●●●●●●●●●●●●●●●●●

●●●●●● ●●

●●●●●●●●●●

●●●

●●●

●●

●●●●

●●

●●●●●●●●

●●●

●●●

●●

●●

●●

●●●

●●●●●●

●●

●●●

●●●●●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●●●

●●●●

●●

●●●

●●●

●●●●●●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●● ●●●

●●● ●

●●

●●

●●● ●

●●●●●

●●●●

●●

●●●

●●

●●●

● ●

●●

● ●●●●

●●

●●●

●●

● ●

●●

●●●

●●●

●●

● ●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●●●

● ●●● ●

●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

● ●●

●● ●

●●●

●●

●●●

●●●

●●●

●● ●●

●●

●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●●●

●● ●

●●●

●●●●

●● ●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

● ●●● ●

●●

● ●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

● ●

●●

●●

● ●

● ●● ●

●●

●●●

●●

●●●

●●●●

●●●●●●

●●●

●● ●

● ●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●● ●

14−319S20S40S60SCCTeIF3Ku70/Ku80PA28Rabunknown

Figure : Data on proteasome complexes from Fabre et al. Mol Syst Biol(2015), DOI: 10.15252/msb.20145497

Page 41: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Software for mass spectrometry and (spatial) proteomics

Bioconductor Open source, enable reproducible research,enables understanding of the data (not a black box) and drivescientific innovation.

I MSnbase – infrastructure to handle quantitative data and meta-data(Gatto and Lilley, 2012) (̃ 350 unique IP download/month).

I pRoloc and pRolocGUI – dedicated visualisation and MLinfrastructure for spatial proteomics (Gatto et al., 2014) (̃ 160unique IP download/month in 2014).

I pRolocdata – structured and annotated spatial proteomics data(Gatto et al., 2014).

I And more generally RforProteomics (Gatto and Christoforou,2014) (̃ 100 unique IP download/month in 2014).

Page 42: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

J Betschinger, J Nichols, S Dietmann, P D Corrin, P J Paddison, and A Smith. Exit from pluripotency is gated byintracellular redistribution of the bhlh transcription factor tfe3. Cell, 153(2):335–47, Apr 2013. doi:10.1016/j.cell.2013.03.012.

LM Breckels, L Gatto, A Christoforou, AJ Groen, KS Lilley, and MW Trotter. The effect of organelle discoveryupon sub-cellular protein localisation. J Proteomics, 88:129–40, Aug 2013.

TPJ Dunkley, S Hester, IP Shadforth, J Runions, T Weimar, SL Hanton, JL Griffin, C Bessant, F Brandizzi,C Hawes, RB Watson, P Dupree, and KS Lilley. Mapping the Arabidopsis organelle proteome. PNAS, 103(17):6518–6523, Apr 2006.

LJ Foster, CL de Hoog, Y Zhang, Y Zhang, X Xie, VK Mootha, and M Mann. A mammalian organelle map byprotein correlation profiling. Cell, 125(1):187–199, Apr 2006.

L Gatto and A Christoforou. Using R and Bioconductor for proteomics data analysis. Biochim Biophys Acta, 1844(1 Pt A):42–51, Jan 2014.

L Gatto and KS Lilley. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry datavisualization, processing and quantitation. Bioinformatics, 28(2):288–9, Jan 2012.

L Gatto, JA Vizcaino, H Hermjakob, W Huber, and KS Lilley. Organelle proteomics experimental designs andanalysis. Proteomics, 2010.

L Gatto, L M Breckels, S Wieczorek, T Burger, and K S Lilley. Mass-spectrometry based spatial proteomics dataanalysis using pRoloc and pRolocdata. Bioinformatics, Jan 2014.

TR Kau, JC Way, and PA Silver. Nuclear transport and cancer: from mechanism to intervention. Nat Rev Cancer,4(2):106–17, Feb 2004.

K Laurila and M Vihinen. Prediction of disease-related mutations affecting protein localization. BMC Genomics,10:122, 2009.

DJL Tan, H Dvinge, A Christoforou, P Bertone, A Arias Martinez, and KS Lilley. Mapping organelle proteins andprotein complexes in Drosophila melanogaster. J Proteome Res, 8(6):2667–2678, Jun 2009.

P Wu and TG Dietterich. Improving svm accuracy by training on auxiliary data sources. In Proceedings of theTwenty-first International Conference on Machine Learning, ICML ’04, New York, NY, USA, 2004. ACM.

Page 43: Mapping the sub-cellular proteome · 2015-12-04 · Spatial proteomics - Why? Mis-localisation Disruption of the targeting/tra cking process alters proper sub-cellular localisation,

Acknowledgements

I Lisa Breckels, Computational Proteomics Unit, Cambridge(ML, algo)

I Sean Holden, Computer Laboratory, Cambridge (ML)

I Kathryn Lilley, Cambridge Centre of Proteomics(Proteomics)

Funding: BBSRC, PRIME-XS EU FP7, Software SustainabilityInstitute (SSI)

Slides available at http://goo.gl/SZRMjg, under a CC-BY license .

Thank you for your attention