mass spectrometry and ribosome profiling, a perfect ... · results affiliations dwnld,,,,...

1
RESULTS AFFILIATIONS DWNLD CONTACT: [email protected], [email protected] Mass spectrometry and ribosome profiling, a perfect combination towards a more comprehensive identification strategy of true in vivo protein forms. Jeroen Crappé* a ; Alexander Koch* a ; Sandra Steyaert a ; Wim Van Criekinge a ; Petra Van Damme b,c ; Gerben Menschaert a MATERIALS & METHODS WORKFLOW OVERVIEW 1. Ribosome profiling of ribosome captured mRNA fragments performed on HiSeq, Illumina sequencing plaLorm (RIBOseq). 2. Quality control based on exisPng tools as FastQC and custom metagenic funcPonal assessment. 3. Transcriptome mapping based on STAR [4] and/or TopHat2 [5] local aligners. 4. Single nucleoPde polymorphism (SNP) calling based on samToolsmpileup and/or GATK. 5. TranslaPon iniPaPon site (TIS) calling based on trained Support Vector Machine (SVM) [2] or rulebased algorithm [1] . 6. TranslaOon product assembly, taken into account the SNP, TIS, INDEL awareness. Construct complete proteome, opPonally combine with SwissProt. 7. MSbased proteomics/pepOdomics using SearchGui [6] and PepPdeShaker [7] tools. 8. Genomecentric visualizaOon of all generated informaPon tracks (Ensembl, UCSC, IGV genome browsers). => All results are stored in a rela%onal SQLite database allowing further detailed analysis. An increasing number of studies involve integraOve analysis of gene and protein expression data, taking advantage of new technologies such as nextgeneraPon transcriptome sequencing (RNASeq) and highly sensiPve mass spectrometry (MS). Recently, a strategy, termed ribosome profiling, based on deep sequencing of ribosomeprotected mRNA fragments, indirectly monitoring protein synthesis, has been described. When used in combinaPon with iniOaOonspecific translaOon inhibitors, it enables the idenPficaPon of (alternaPve) translaPon iniPaPons. INTRODUCTION RIBOseq experimental workflow [3] In contrast to rouPnely employed protein databases in proteomics searches, RIBOseq derived data gives a more representaOve expression state and accounts for sequence variaPon informaPon ( single nucleoOde polymorphism, inserOons, deleOons and RNAsplice variants) and alternaOve translaOon iniOaOon leading to Nterminal extended and/or truncated protein forms. Furthermore, RIBOseq reveals translaPon start at nearcognate start sites. Without taking this informaPon into account, MSbased proteomic studies may fail to detect novel, important protein forms. MOA of iniOaOonspecific translaOon inhibitors and example of genemapped RIBOseq signals [1,2] GOALS Compile a samplespecific protein search database based on ribosome profiling sequencing data. Introduce new translaOon products in the MS search space: Nterminal extensions/ truncaPons, translated uORFs, nearcognate start sites. Bridging two omics worlds: transcriptomics & MSbased proteomics by means of RIBOseq. Deep proteome coverage based on ribosome profiling aids mass spectrometrybased protein and pepOde discovery and provides evidence of alternaOve translaOon products and nearcognate translaOon iniOaOon events [1] . Future work will mainly focus on : Further invesPgaPon of the differenPal expression on translaPon level of UniProtKBSwissProt and RIBOseq derived translaPon products: technological and/or biological relevance? Detailed assessment of the difference between in vivo measurement of protein synthesis (RIBO seq) and protein presence (MSbased proteomics). Generalize the pipeline to all types of nextgeneraPon sequencing RNAseq data: (direcPonal) A+/A RNAseq, CLIPseq, exomeseq, ribo seq QuanOtaOve correlaOon of RIBOseq and (non) labelled MSbased proteomics Incorporate Pipeline into GalaxyP [9] CONCLUSION & FUTURE WORK [1] Lee, S., Liu, B., Lee, S., Huang, S. X., Shen, B., and Qian, S. B. (2012) Global mapping of translaPon iniPaPon sites in mammalian cells at single nucleoPde resoluPon. Proc. Natl. Acad. Sci. U.S.A. 109, E2424– E2432. [2] Ingolia, N. T., Lareau, L. F., and Weissman, J. S. (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802. [3] Michel, A. M., Baranov, P.V. (2013) Ribosome profiling: a HiDef monitor for protein synthesis at the genomewide scale. Wiley Interd. Rev. RNA. Epub ahead of print. [4] Dobin A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski C., Jha S., Batut P., Chaisson, M., Gingeras T. R. (2013) STAR: ultrafast universal RNAseq aligner. BioinformaPcs, 29 (1): 1521. [5] Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., Salzberg, S. L. (2013) TopHat2: accurate alignment of transcriptomes in the presence of inserPons, delePons and gene fusions. Genome Biology 2013, 14:R36. [6] Vaudel, M., Barsnes, H., Berven, F.S., Sickmann, A., Martens, L. (2011) SearchGUI: An opensource graphical user interface for simultaneous OMSSA and X!Tandem searches. Mar;11(5):9969. [7] hup://pepPdeshaker.googlecode.com [8] Menschaert, G., Van Criekinge, W., Notelaers, T., Koch, A., Crappé, J., Gevaert, K., Van Damme, P. (2013) Deep proteome coverage based on ribosome profiling aids MSbased protein and pepPde discovery and provides evidence of alternaPve translaPon products and nearcognate translaPon iniPaPon events. Mol Cell Prot. Epub ahead of print. [9] hup://getgalaxyp.org REFERENCES a Laboratory of BioinformaPcs and ComputaPonal Genomics, Department of MathemaPcal Modelling, StaPsPcs and BioinformaPcs, Faculty of Bioscience Engineering, Ghent University, B9000, Ghent b Department of Medical Protein Research, Flemish InsPtute for Biotechnology, B9000 Ghent c Department of Biochemistry, Ghent University, B9000 Ghent * Cofirst authors MS experiments 2 Data Sets: Shotgun proteomics: 24 LC runs > 112.974 MS/MS spectra. PosiPonal proteomics (Nterm COFRADIC): 45 LC runs > 68.523 MS/MS spectra 1556 259 16 3 1 4 Riboseq (n=13.454) Nterminomics (n=1.835) (A) COFRADIC: Pie charts depicPng the idenOficaOon of novel translaOon products: N terminal extensions truncaPons, uORF translaPons, internal outofframe translaPons (lev panel is RIBOseq based, right panel is MSbased using custom DB). (B) SHOTGUN: Pie chart showing an +2.5% gain in protein idenOficaOon using the combined RIBOseq derived and SwissProt search database. (C) Weblogos depicPng the sequence context (three bases upstream and four bases downstream) of the newly idenPfied translaPon iniPaPon sites, clearly poinPng to nearcognate translaOon iniOaOon. Version 1 Custom DB [8] based on: SVM based TIS calling [2] Using reference genomic sequence Combi of RIBOseq derived and SwissProt WebLogo 3.3 0.0 0.5 1.0 probability T C G A G T A C T A C G T C G A 5 C T G A C T G WebLogo 3.3 0.0 1.0 2.0 bits T C G A T A C A C G C G A 5 C T G A C T G 3117 86 27 11 11 49 swiss new muta3on n5term5ext alterna3ve isoform 83 2 1 non(swiss n(term(ext muta3on n = 3252 B Version 2 Custom DB based on: rulebased TIS calling [1] Including SNPINDEL informaOon Only RiboSeq derived sequences Examples: (A) UCSC genomebrowser screenshot showing the extended form of the Fxr2 gene translaPon product starPng at near cognate GTG start site. IdenPfied trypPc pepPde is also depicted. (B) 2 examples of (a)synonymous SNPs with overlapping trypPc pepPde idenPficaPon resulPng from the custom RIBOseq derived protein product database. AB near-cognate GTG start triplet synonymous mutaPon : Vps53 gene generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 MACAAARSPADQDRFICIYPAYLNNKKTIAEGRRIPISKAVENPTATEIQDVCSAVGLNA sp|Q9D7A6|SRP19_MOUSE MACSAARPPADQDRFIFIYPAYLNNKKTIAEGRRIPISKAVENPTATEIQDVCSAVGLNA ***:*** ******** ******************************************* generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 FLEKNKMYSREWNRDVQFRGRVRVQLKQEDGSLCLVQFPSRKSVMLYVAEMIPKLKTRTQ sp|Q9D7A6|SRP19_MOUSE FLEKNKMYSREWNRDVQFRGRVRVQLKQEDGSLCLVQFPSRKSVMLYVAEMIPKLKTRTQ ************************************************************ generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 KSGGADPSLQQGEGSKKGKGKKKK sp|Q9D7A6|SRP19_MOUSE KSGGADPILQQGEGSKKGKGKKKK ******* **************** asynonymous : Srp19 gene C A 0 1000 2000 3000 4000 protein * SVM based TIS calling (version 1, no SwissProt) OLD SVM based TIS calling (version 1, + SwissProt) OLD stringent rule based TIS calling New stringent rule based TIS calling and SNP/INDEL aware NEW shotgun identifications using custom protein DB protein * peptide * PSM * SVM based TIS calling (version 1, no SwissProt) OLD 2194 14519 32818 SVM based TIS calling (version 1, + SwissProt) OLD 3252 stringent rule based TIS calling NEW 3295 15913 32348 stringent rule based TIS calling and SNP/INDEL aware NEW 3341 14967 32620 * 1% FDR level on protein, peptide and PSM level

Upload: others

Post on 16-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mass spectrometry and ribosome profiling, a perfect ... · RESULTS AFFILIATIONS DWNLD,,,, CONTACT:,,Gerben.Menschaert@Ugent.be, Jeroen.Crappe@Ugent.be Mass spectrometry and ribosome

RESULTS

AFFILIATIONS DWNLD

       CONTACT:    [email protected],  [email protected]

Mass spectrometry and ribosome profiling, a perfect combination towards a more comprehensive identification strategy of true in vivo protein forms.

Jeroen  Crappé*a;  Alexander  Koch*a;  Sandra  Steyaerta;  Wim  Van  Criekingea;  Petra  Van  Dammeb,c;  Gerben  Menschaerta

MATERIALS  &  METHODS

WORKFLOW  OVERVIEW

1.   Ribosome  profiling   of   ribosome  captured  mRNA   fragments  performed  on  HiSeq,  Illumina  sequencing  plaLorm  (RIBO-­‐seq).  

2.   Quality   control   based   on   exisPng   tools   as   FastQC   and  custom  metagenic  funcPonal  assessment.  

3.  Transcriptome  mapping  based  on  STAR  [4]  and/or  TopHat2  [5]  local  aligners.

4.   Single   nucleoPde   polymorphism   (SNP)   calling   based  on  samTools-­‐mpileup  and/or  GATK.

5.  TranslaPon  iniPaPon  site  (TIS)  calling  based  on  trained  Support   Vector   Machine   (SVM)   [2]   or   rule-­‐based  algorithm  [1].

6.  TranslaOon  product  assembly,  taken  into  account  the  SNP,   TIS,   INDEL   awareness.   Construct   complete  proteome,  opPonally  combine  with  SwissProt.  

7.  MS-­‐based   proteomics/pepOdomics   using  SearchGui  [6]  and  PepPdeShaker  [7]  tools.

8.   Genome-­‐centric   visualizaOon   of   all  generated   informaPon   tracks   (Ensembl,  UCSC,  IGV  genome  browsers).

=>  All  results  are  stored  in  a  rela%onal  SQLite  database  allowing  further  detailed  analysis.

An  increasing  number  of  studies  involve  integraOve  analysis  of  gene   and  protein   expression   data,   taking   advantage   of   new  technologies   such   as   next-­‐generaPon   transcriptome  sequencing  (RNA-­‐Seq)  and  highly  sensiPve  mass  spectrometry  (MS).  Recently,   a   strategy,   termed  ribosome   profiling,   based  on  deep  sequencing  of  ribosome-­‐protected  mRNA  fragments,  indirectly  monitoring   protein   synthesis,   has   been   described.  When  used  in  combinaPon  with  iniOaOon-­‐specific   translaOon  inhibitors,   it   enables   the   idenPficaPon   of   (alternaPve)  translaPon  iniPaPons.  

INTRODUCTION

RIBO-­‐seq  experimental  workflow  [3]

In   contrast   to   rouPnely   employed   protein   databases   in  proteomics   searches,   RIBO-­‐seq   derived   data   gives   a   more  representaOve   expression   state   and   accounts   for   sequence  variaPon   informaPon   (single   nucleoOde   polymorphism,  inserOons,  deleOons  and  RNA-­‐splice  variants)  and  alternaOve  translaOon   iniOaOon   leading   to   N-­‐terminal   extended   and/or  truncated   protein   forms.   Furthermore,   RIBO-­‐seq   reveals  translaPon  start  at  near-­‐cognate  start  sites.  Without  taking  this  informaPon  into  account,  MS-­‐based  proteomic  studies  may  fail  to  detect  novel,  important  protein  forms.MOA  of  iniOaOon-­‐specific  translaOon  inhibitors  

and  example  of  gene-­‐mapped  RIBO-­‐seq  signals  [1,2]

GOALS

✓ Compile  a  sample-­‐specific  protein  search  database  based  on  ribosome  profiling  sequencing  data.

✓ Introduce   new   translaOon   products   in   the   MS   search   space:   N-­‐terminal   extensions/truncaPons,  trans-­‐lated  uORFs,  near-­‐cognate  start  sites.

✓ Bridging  two  omics  worlds:  transcriptomics  &  MS-­‐based  proteomics  by  means  of  RIBO-­‐seq.

✓ Deep   proteome   coverage   based   on   ribosome   profiling   aids   mass  spectrometry-­‐based  protein  and  pepOde  discovery  and  provides  evidence  of  alternaOve  translaOon  products  and  near-­‐cognate  translaOon  iniOaOon  events  [1].

✓ Future  work  will  mainly  focus  on  :

èFurther   invesPgaPon  of  the  differenPal  expression  on  translaPon  level    of   UniProtKB-­‐SwissProt   and   RIBO-­‐seq   derived   translaPon   products:  technological  and/or   biological   relevance?  Detailed  assessment  of   the  difference   between   in   vivo   measurement   of   protein   synthesis   (RIBO-­‐seq)  and  protein  presence  (MS-­‐based  proteomics).

èGeneralize   the   pipeline   to   all   types   of   next-­‐generaPon   sequencing  RNA-­‐seq  data:   (direcPonal)  A+/A-­‐  RNA-­‐seq,  CLIP-­‐seq,  exome-­‐seq,  ribo-­‐seq

èQuanOtaOve   correlaOon   of   RIBO-­‐seq   and   (non)   labelled   MS-­‐based  proteomics

èIncorporate  Pipeline  into  Galaxy-­‐P  [9]

CONCLUSION  &  FUTURE  WORK

[1]  Lee,   S.,   Liu,   B.,   Lee,   S.,   Huang,   S.   X.,   Shen,   B.,   and  Qian,   S.   B.   (2012)   Global  mapping   of   translaPon  iniPaPon  sites  in  mammalian  cells  at  single  nucleoPde  resoluPon.  Proc.  Natl.  Acad.  Sci.  U.S.A.  109,  E2424–E2432.  [2]  Ingolia,  N.  T.,   Lareau,   L.  F.,  and  Weissman,   J.  S.  (2011)   Ribosome  profiling   of  mouse  embryonic  stem  cells  reveals  the  complexity  and  dynamics  of  mammalian  proteomes.  Cell  147,  789–802.[3]  Michel,  A.  M.,  Baranov,   P.V.   (2013)  Ribosome  profiling:  a  Hi-­‐Def  monitor   for   protein   synthesis  at  the  genome-­‐wide  scale.  Wiley  Interd.  Rev.  RNA.  Epub  ahead  of  print.[4]  Dobin  A.,  Davis,  C.  A.,  Schlesinger,  F.,  Drenkow,  J.,  Zaleski  C.,  Jha  S.,  Batut  P.,  Chaisson,  M.,  Gingeras  T.  R.  (2013)  STAR:  ultrafast  universal  RNA-­‐seq  aligner.  BioinformaPcs,  29  (1):  15-­‐21.[5]   Kim,   D.,   Pertea,   G.,   Trapnell,   C.,   Pimentel,   H.,   Kelley,   R.,   Salzberg,   S.   L.   (2013)   TopHat2:   accurate  alignment  of   transcriptomes   in   the  presence  of   inserPons,   delePons   and   gene   fusions.  Genome  Biology  2013,  14:R36.[6]  Vaudel,   M.,   Barsnes,   H.,   Berven,   F.S.,   Sickmann,   A.,   Martens,   L.   (2011)   SearchGUI:   An   open-­‐source  graphical  user  interface  for  simultaneous  OMSSA  and  X!Tandem  searches.  Mar;11(5):996-­‐9.[7]  hup://pepPde-­‐shaker.googlecode.com  [8]  Menschaert,   G.,   Van   Criekinge,   W.,   Notelaers,   T.,   Koch,   A.,   Crappé,   J.,   Gevaert,   K.,   Van   Damme,   P.  (2013)  Deep  proteome  coverage  based  on  ribosome  profiling  aids  MS-­‐based  protein  and  pepPde  discovery  and  provides  evidence  of  alternaPve  translaPon  products  and  near-­‐cognate  translaPon   iniPaPon  events.  Mol  Cell  Prot.  Epub  ahead  of  print.[9]  hup://getgalaxyp.org

REFERENCES

a   Laboratory   of   BioinformaPcs   and   ComputaPonal   Genomics,  Department   of   MathemaPcal   Model l ing,   StaPsPcs   and  BioinformaPcs,   Faculty   of   Bioscience   Engineering,   Ghent  University,  B-­‐9000,  Ghentb   Department   of   Medical   Protein   Research,   Flemish   InsPtute   for  Biotechnology,  B-­‐9000  Ghentc  Department  of  Biochemistry,  Ghent  University,  B-­‐9000  Ghent*  Co-­‐first  authors

MS  experiments  -­‐  2  Data  Sets:Shotgun  proteomics:  24  LC  runs  -­‐>  112.974  MS/MS  spectra.                                                                                                              PosiPonal  proteomics  (N-­‐term  COFRADIC):  45  LC  runs  -­‐>  68.523  MS/MS  spectra

1556$

259$ 16$

3$1$

4$

Ribo-­‐seq  (n=13.454)

N-­‐terminomics  (n=1.835)

(A)   COFRADIC:   Pie   charts   depicPng   the   idenOficaOon   of   novel   translaOon   products:   N-­‐terminal  extensions  truncaPons,  uORF  translaPons,  internal  out-­‐of-­‐frame  translaPons  (lev  panel   is   RIBO-­‐seq   based,   right   panel   is   MS-­‐based   using   custom   DB).   (B)   SHOTGUN:   Pie  chart   showing   an   +2.5%   gain   in   protein   idenOficaOon   using   the   combined   RIBO-­‐seq  derived   and   SwissProt   search   database.   (C)   Weblogos   depicPng   the   sequence   context  (three   bases   upstream   and   four   bases   downstream)   of   the   newly   idenPfied   translaPon  iniPaPon  sites,  clearly  poinPng  to  near-­‐cognate  translaOon  iniOaOon.  

Version  1  Custom  DB  [8]  based  on:  

-­‐  SVM  based  TIS  calling  [2]-­‐  Using  reference  genomic  sequence-­‐  Combi  of  RIBO-­‐seq  derived  and  SwissProt

WebLogo 3.3

0.0

0.5

1.0

pro

babili

ty

TCGAGTAC

TACG

TCGA

5

CTG

ACTG

WebLogo 3.3

0.0

1.0

2.0

bits

T

C

GA

G

T

A

C

T

A

CG

T

C

G

A

5

CTG

A

CTG

3117$

86$

27$

11$11$49$

swiss$new$muta3on$n5term5ext$alterna3ve$isoform$

83#

2# 1#

non(swiss#

n(term(ext#

muta3on#n  =  3252

B

Version  2  Custom  DB  based  on:  -­‐  rule-­‐based  TIS  calling  [1]-­‐  Including  SNP-­‐INDEL  informaOon-­‐  Only  Ribo-­‐Seq  derived  sequences

Examples:

(A)   UCSC   genome-­‐browser   screenshot   showing   the  extended  form   of   the   Fxr2   gene   translaPon   product   starPng   at   near-­‐cognate   GTG   start   site.   IdenPfied   trypPc   pepPde   is   also  depicted.   (B)   2   examples   of   (a)syn-­‐onymous   SNPs   with  overlapping   trypPc   pepPde   idenPficaPon   resulPng   from   the  custom  RIBO-­‐seq  derived  protein  product  database.

A Bnear-cognate

GTG start triplet

synonymous  mutaPon  :  Vps53  gene

generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 MACAAARSPADQDRFICIYPAYLNNKKTIAEGRRIPISKAVENPTATEIQDVCSAVGLNAsp|Q9D7A6|SRP19_MOUSE MACSAARPPADQDRFIFIYPAYLNNKKTIAEGRRIPISKAVENPTATEIQDVCSAVGLNA ***:*** ******** *******************************************

generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 FLEKNKMYSREWNRDVQFRGRVRVQLKQEDGSLCLVQFPSRKSVMLYVAEMIPKLKTRTQsp|Q9D7A6|SRP19_MOUSE FLEKNKMYSREWNRDVQFRGRVRVQLKQEDGSLCLVQFPSRKSVMLYVAEMIPKLKTRTQ ************************************************************

generic_Q9D7A6|uc008ekg.1_36|Q9D7A6 KSGGADPSLQQGEGSKKGKGKKKKsp|Q9D7A6|SRP19_MOUSE KSGGADPILQQGEGSKKGKGKKKK ******* ****************

asynonymous  :  Srp19  gene

C

A

0

1000

2000

3000

4000

protein *SVM based TIS calling (version 1, no SwissProt) OLDSVM based TIS calling (version 1, + SwissProt) OLDstringent rule based TIS calling Newstringent rule based TIS calling and SNP/INDEL aware NEW

shotgun identifications using custom protein DB protein * peptide * PSM *SVM based TIS calling (version 1, no SwissProt) OLD 2194 14519 32818SVM based TIS calling (version 1, + SwissProt) OLD 3252stringent rule based TIS calling NEW 3295 15913 32348stringent rule based TIS calling and SNP/INDEL aware NEW 3341 14967 32620* 1% FDR level on protein, peptide and PSM level