automated analysis of proteomics data on tap

Upload: thinkerbot

Post on 30-May-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    1/57

    Automated Analysis ofProteomics Data

    on TapSimon Chiang

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    2/57

    Tap (Task Application)

    a b1

    b c1

    c2

    c

    ?

    x

    x1

    x2

    y

    y1 z

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    3/57

    MS/MS IdentificationProtein Mixture

    Digest to Peptides

    Measure PeptideMass (MS)

    Fragment, MeasureFragment Masses

    (MSMS)

    Identify Peptides

    Correlate to Proteins

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    4/57

    Peptide Identification

    Peptide Mass

    Fragmentation Spectrum (match/score)

    Experimental Predicted from DBWednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    5/57

    Peptide Identification

    Peptide Mass (filter)

    Fragmentation Spectrum

    Experimental Predicted from DbWednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    6/57

    PTM Identification

    Variable Modification

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    7/57

    Crosslinks

    An unusual post-translational modification(modification by another peptide)

    Same process, scoring algorithms

    Main difference is prediction of crosslinkedspectrum

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    8/57

    Search Space

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    9/57

    Iterative Searching

    Search normally

    Harvest strong peptide identifications Generate subset database

    Search unidentified for crosslinks vs subset

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    10/57

    Iterative Searching

    Strong Ids

    Subset(identified proteins)

    Crosslink Db

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    11/57

    Variation

    Search normally

    Harvest strong peptide identifications Generate subset database

    Search unidentified for PTMs vs subset

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    12/57

    Variation

    Search normally

    Harvest strong peptide identifications Generate subset database

    Search unidentified for PTMs vs subset

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    13/57

    Variation

    Search normally Harvest strong peptide identifications

    Repeat (different search engine)

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    14/57

    Variation

    Search normally Harvest strong peptide identifications

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    15/57

    Searching

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    16/57

    A Practical Matter

    Not (mostly) an issue of having software

    Time, complexity, configuration Processing of results

    Mostly an issue ofusingsoftware

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    17/57

    Web Applications

    Basis for most search engines/tools

    Primarily a human interface Hard to automate a human*

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    18/57

    Tap-Mechanize

    Mechanize - a library for running websites

    Used to redirect/resubmit HTTP

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    19/57

    Tap-Mechanize

    User Application

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    20/57

    Tap-Mechanize

    User

    Application

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    21/57

    Advantages

    Utilizes native web interface

    Robust and adaptable Multi-page requests supported

    Lowest common denominator(works for most web apps)

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    22/57

    Tap (Task Application)

    A framework to automate workflows

    Define computational tasks

    Join tasks into workflows

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    23/57

    Tasks

    Submit data to a search engine

    Download results

    Convert a file format Perform a calculation

    Generate a report ...

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    24/57

    Sequence

    a b c

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    25/57

    Fork

    a b

    c

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    26/57

    Merge

    a

    b

    c

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    27/57

    What Tap Does...

    Programmers:

    make/test, document, distribute Users:

    install, learn, configure/use

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    28/57

    Usage

    Standard command line Web interface in development

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    29/57

    Internals

    Written in Ruby DSL for docs, configs

    Distribution with RubyGems

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    30/57

    Examples

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    31/57

    Searching Perfect Data

    Digest Protein(ALBU_HUMAN, Trypsin)

    Generate Predicted Spectra(b,y ions, peptides n > 3 residues)

    Search

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    32/57

    Workflow

    Digest Predict Search

    Protein [Spectra] Result URL[Peptides]

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    33/57

    Peptide Fragmentation

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    34/57

    Protein Identification

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    35/57

    Unassiged Peptide?

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    36/57

    Peptide Mass Error?

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    37/57

    Explanations

    Unassigned peptides due to a serverconfiguration

    Peptide mass error due to rounding(algorithm precision)

    MinPepLenInSearch 5

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    38/57

    Classify Results by GO

    Terms Search with Mascot

    Search with GPM Extract Intersection of Results

    Map Accessions to Entrez (PIR)

    Classify with GoGetter

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    39/57

    Workflow

    Load Data

    mgf

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    40/57

    Workflow

    Search/Export Mascot

    Search/Export GPM

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    41/57

    Workflow

    Intersect Results

    Map Accessionsto Entrez

    GoGetter

    Graph

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    42/57

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    43/57

    Gene Ontology, GO Slims : Biological Process - Weighted (Dataset 1 name)

    Biological process (go:0008150) (16.35%)

    Cellular process (go:0009987) (16.18%)

    Macromolecule metabolic process (go:0043170) (15.98%

    Metabolic process (go:0008152) (15.70%)

    Nucleobase, nucleoside, nucleotide and nucleic aci..Cell communication (go:0007154) (8.73%)

    Regulation of biological process (go:0050789) (6.46%

    Transport (go:0006810) (4.14%)

    Response to stimulus (go:0050896) (2.48%)

    Multicellular organismal development (go:0007275) (2Biosynthetic process (go:0009058) (0.67%)

    Cell differentiation (go:0030154) (0.56%)

    Cell death (go:0008219) (0.48%)

    Electron transport (go:0006118) (0.48%)

    Secretion (go:0046903) (0.33%)

    Membrane fusion (go:0006944) (0.33%)

    ALBU_HUMAN

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    44/57

    Recyling

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    45/57

    Digest Predict Search

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    46/57

    Digest Predict Load Search/Export Mascot

    Search/Export GPM

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    47/57

    Simple Iterative Search

    Search Partition Search (+ PTMs)

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    48/57

    Data Preparation

    SearchConvertFormatExtract Data

    .RAW [.dta] .mgf Result URL

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    49/57

    Mspire

    ms-mascot

    ms-uniprot

    ms-in_silico

    ms-gpm

    ms-fasta

    ms-unimod

    ms-xcalibur

    ms-prots

    ms-data_explorer

    constants

    molecules

    external

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    50/57

    Anticipated Usage

    Programmer makes tasks

    Researchers make workflows from tasks Definition

    Configuration

    Execution

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    51/57

    Importance

    Expedited analysis, reproducibiliy

    Evaluation studies (swap x for y) Teaching/Learning

    Performance is not secondary. It affecteverything you do, it affects how you usean application. - Linus Torvalds

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    52/57

    Context

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    53/57

    TPP, TOPP, CPAS

    Various pipeline suites

    Pipeline in the sense of sequence

    Relatively hard to install, use, extend

    Many similarities, but only sequences

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    54/57

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    55/57

    Ongoing Research

    Iterative searching - Nesvizhskii, 2006 Crosslink id via subsetting - Rinner, 2008 Subsetting search engine - Li, 2009

    Enhancing peptide identification confidence bycombining search methods - Alves, 2008

    Improving sensitivity by probabilistically combiningresults from multiple MS/MS search methodologies- Searle, 2008

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    56/57

    Future Directions

    Web interface, cleanup

    Implement iterative searching with subsets Development of crosslink search algorithm

    (if necessary)

    Wednesday, April 8, 2009

  • 8/14/2019 Automated Analysis of Proteomics Data on Tap

    57/57

    Acknowledgments

    Kirk Hansen

    Hansen Lab

    Ashley Zurawel Lauren Kiemele

    Ahn Lab John Prince

    Thesis Committee

    Bob Hodges

    Paul Fennessey Christine Wu Larry Hunter Brad Bendiak

    Mark Duncan