the titin problem: hitchhiking siblings during protein ... · the titin problem: hitchhiking...

THE TITIN PROBLEM: HITCHHIKING SIBLINGS DURING PROTEIN INFERENCE

KYLE LUCKE, MAX THIBEAU, LEVI ZELL, JULIANUS PFEUFFER, XIAO LIANG, AND OLIVER SERANG ////////////// DEPARTMENT OF COMPUTER SCIENCE

8 \\ ACKNOWLEDGMENT

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number P20GM103546.

This material is based upon work supported by the National Science Foundation under grant no. 1845465.

NIH COBRE

NSF CAREER

6 \\ REFERENCES

1 \\ THE FIDO PROTEIN INFERENCE MODEL 2 \\ "HITCHHIKING" PROTEINS

3 \\ EVERGREENFOREST: A SOLVER FORPROBABILISTIC LINEAR DIOPHANTINE EQUATIONS

7 \\ AVAILABILITY

5 \\ TITIN SIMULATIONS

Available free (MIT license) from https://bitbucket.org/orserang/evergreenforest

You can use the EvergreenForest library as bith a header-only C++11 library or via the modeling language (run make in EvergreenForest/src/Language).

4 \\ PROTEIN INFERENCE MODELS

Proteins Peptides Peptide probabilities

Possible presentprotein sets:

Probability Y1 is absentgiven that protein set:

If a protein is present,how often will it makea peptide?

How often does apeptide show upfrom error?

A priori, percent ofproteins are actuallypresent?

The FIDO model, a Bayesian generalization of vertex-cover methods [1]:

We can introduce a change of variables to a "cardinal model":

Proteins Peptides Peptide probabilities

Decoy X1

Decoy X2

Actually absent

Actually present

Not actually considered

PMF (A,B) (0,0) [[0.15, 0.3, 0.1], [0.2, 0.3, 0.2]]PMF (B,C) (0,0) [[0.3, 0.2], [0.4, 0.1]]PMF (A,C) (-1,0) [[0.1, 0.2], [0.1, 0.3], [0.2, 0.1]]PMF (D) (-1) [0.1, 0.2, 0.3, 0.2, 0.1, 0.1]D-C=A+B

#change to brute force:@engine=brute_force()Pr("BRUTE FORCE RESULTS")Pr(A; B)Pr( )

#change to LBP:@engine=loopy(@dampening=0.05,@epsilon=1e-6,@max_iter=10000)Pr("LOOPY RESULTS")Pr(A;B)Pr( )

Target proteins may "hitchhike" when they share peptides with a present target. This is more common than a target protein sharing peptides with a decoy protein. It can result in many identifications with a very low FDR (all targets!), while in reality most of those targets are actually absent (actually high FDR!). Training for target-decoy discrimination may incentivize this bad result.

BRUTE FORCE RESULTSA PMF:{[0] to [1]} t:[0.486486, 0.513514]B PMF:{[0] to [1]} t:[0.432432, 0.567568]Log probability of model: -6.06943LOOPY RESULTSA PMF:{[0] to [1]} t:[0.47552, 0.52448]B PMF:{[0] to [1]} t:[0.416622, 0.583378]Log probability of model: -5.92171

REPRINT QRPhoto me for a reprint!

Automatic multithreading!

TRIOT: Every tensor operation is unrolled to the right number of for loops at runtime!

Lazy, trimmed convolution trees!

FIDO: The classic!EPIFANY: Uses a prior on each N variable, which penalizes multiple proteins per peptide. [3]Mutex: Strong prior on N variables: at most, one protein allowed per peptide.

Target-decoy labels:The classic! Choose parameters that maximizethe target-decoy discrimination and calibration.

Empirical Bayes:Choose the parameters that maximize likelihood.

PAIRED WITH

[1] O SERANG, M MACCOSS, AND W NOBLE. "EFFICIENT MARGINALIZATION TO COMPUTE PROTEIN POSTERIOR PROBABILITIES FROM SHOTGUN MASS SPECTROMETRY DATA." JOURNAL OF PROTEOME RESEARCH 9.10 (2010): 5346-5357.[2] O SERANG. "THE PROBABILISTIC CONVOLUTION TREE: EFFICIENT EXACT BAYESIAN INFERENCE FOR FASTER LC-MS/MS PROTEIN INFERENCE." PLOS ONE 9.3 (2014): E91507.[3] J PFEUFFER, TSACHSENBERG, T DIJKSTRA, O SERANG, K REINERT AND O KOHLBACHER. "EPIFANY - A METHOD FOR EFFICIENT HIGH-CONFIDENCE PROTEIN INFERENCE." (IN PREPARATION)

Cardinal models can be solved very efficiently usingprobabilistic convolution trees! [2] If we have n variables, each with states {0, 1, 2, ... k-1}, we can find all posteriors simultaneously in

FIDO EPIFANY Mutex FIDO EPIFANY Mutex

Target-decoy parameter est. Empirical Bayes parameter est.

Avg. sensitivity@<10% FDR:

0.162 0.156 0.156 0.163 0.183 0.170

These simulations show that empirical Bayes is less resistant to overfitting in the context of hitchhiking; likewise, they show that empirical Bayes with EPIFANY and Mutex models, which are more hawkish about shared peptides, are less likely to overfit. Prototyping new models with EvergreenForest is easy!

the titin problem: hitchhiking siblings during protein ... · the titin problem: hitchhiking...

Documents

hitchhiking africa africa sustainable development...

hitchhiking and selective sweeps -...

titin vol i issue 1

10 things we learned on our hitchhiking trip

a titin truncation variant co- segregating with dilated …...

molecular basis of titin exon exclusion by rbm20 and the...

hitchhiking: a magazine

the evolution of hitchhiking

invited review stretching molecular springs: elasticity of...

hitchhiking around the world

the hitchhiking game - weebly

jurnal reading mb titin

titin uchu o crema de tumbo.docx

hitchhiking 2

washington university genetics | - evidence for hitchhiking...

hitchhiking in an age of suspicion: work techniques and...

hitchhiking documentary

titin force enhancement following active stretch of skinned...

secondary and tertiary structure elasticity of titin z1z2...

evolution of hitchhiking