peaks: de novo sequencing using ms/ms spectra bin ma, u. western ontario, canada kaizhong zhang,u....

28
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

PEAKS: De Novo Sequencing using MS/MS spectra

Bin Ma, U. Western Ontario, Canada

Kaizhong Zhang, U. Western Ontario, Canada

Chengzhi Liang, Bioinformatics Solutions Inc. Canada

Page 2: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Outline

• Background – Tandem Mass Spectrometry

• De novo sequencing– Problem Definition and Algorithm.

• Software implementation – PEAKS

• Future work

Page 3: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Background

• Human has 100,000 different proteins. Because of the existence of post translational modifications, each protein can have many different versions.

• Diseases are closely related to the abnormal proteins or the expression levels of proteins.

• Given a tissue, the identification of the proteins (and their modified versions) in it is a fundamental problem for the drug design.

Page 4: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Proteins and Peptides

• A protein is a sequence of 20 different types of amino acids.– A protein is a string over alphabet with size 20

• A peptide is a substring of the protein.• The 20 amino acids have 19 distinct masses.

– I and L have the same mass and cannot (difficult) be distinguished by MS/MS.

– Regard them as the same letter.

Page 5: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Tandem Mass Spectrometry

• MS/MS is the only reliable way for protein identification.

…VITK | GTDIMNEMR | SMW…

tissue fraction gel protein

peptide

Page 6: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

LGSSEVEQVQLVVDGVKpeptide sequence:

tandem mass spectrometer:

MS/MS spectrum

de novo sequencing:

LGSSEVEQVQLVVDGVK

database

Page 7: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

How Does a Peptide Fragment?

m(y1)=19+m(A4)m(y2)=19+m(A4)+m(A3)m(y3)=19+m(A4)+m(A3)+m(A2)

m(b1)=1+m(A1)m(b2)=1+m(A1)+m(A2)m(b3)=1+m(A1)+m(A2)+m(A3)

Page 8: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Matching Sequence with Spectrum

Page 9: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

• For any peptide P= a1…an, m(P) = Σi ai.

• De Novo Sequencing

– Given a spectrum, a mass value m, compute a sequence P, s.t. m(P)=m, and the matching score score(P) is maximized.

De Novo Sequencing

Page 10: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

A Simpler Case – Only Y-ions

Page 11: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Y-ions Determined By a Suffix19

y1 y2 y3score(Q) can be defined for a suffix Q.

)(max)()(

QscoreuDPuQm

)()()( ufVRscoreLVRscore

)()(max)(a

ufauDPuDP

Page 12: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Counting Both y and b ions

Page 13: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Strategies

• Consider a pair of prefix R and a suffix Q simultaneously.

• Consider only those pairs (R,Q) that satisfy a nice property, which we call “chummy”

• Chummy pairs allow:– The score of a chummy pair can be computed

recursively from a smaller chummy pair. – There are a series of chummy pairs that grow to

the optimal solution.

Page 14: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Dynamic Programming

• Combining Lemma A, B, we can compute

• Suppose (R,Q) is the pair maximizing DP(u,v) under the condition m(R)+m(Q)+a=m. Then RaQ is the optimal peptide.

),(max),(

chummy ),(

)(,)(QRscorevuDP

QR

vQmuRm

Page 15: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

PEAKS – The Software

Page 16: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Red = Correct

m/z z Correct Sequence PEAKS (de novo) Comments Lutefisk (de novo)

MALDI MS/MS BSA

927.4 1 YLYEIAR YLYEIAR correct [276.14]EY[184.08]R 1439.7 1 RHPEYAVSVLLR GVLMVDVPPADNGR Wrong (?) No results 1479.8 1 LGEYGFQNALIVR LWYGFQNALIVR correct No results 1639.8 1 KVPQVSTPTLVEVSR RAPKVPQVSTPTLVEVSR correct No results

ESI MS/MS Cyt- c

482.7 2 EDLIAYLK EDLIAYLK correct [357.15]LAYLK 584.8 2 TGPNLHGLFGR TGPNLHGLFGR correct TGPNLHGLFGR 589.3 1 GDVEK VDVEK V = Ac-G VDVEK 634.4 1 IFVQK IFVQK correct IFVQK 678.3 1 YIPGTK YIPGTK correct YIPGTK 728.8 2 TGQAPGFSYTDANK TGQAPGFSYTDANK correct [199.10]SAPGF[250.09]TWNK 779.4 1 MIFAGIK MIFAGIK correct [244.12]FAGLK 792.9 2 KTGQAPGFSYTDAMK KTGAGAPGFSYTDAMK almost [229.15]QGAPGAYQNHANK 817.3 2 IFVQKCAQCHTVEK QFVTHMACCHTVEK partial [257.08][218.08][GP][260.08][HM]TVEK

Apo-Myoglobin

662.3 1 ASEDLK ASEDLK correct [244.07]SALK 689.9 2 HGTVVLTALGGILK HGTVVLTALGGILK correct HGTVVLTALG[170.1]LK 748.4 1 ALELFR ALELFR correct [184.12]ELFR 803.9 2 VEADIAGHGQEVLIR LDADIAGHGQEVLIR almost no results 908.4 2 GLSDGEWQQVLNVWGK GLSDGEWQQVLNVWGK correct [170.11]SG[244.07]WQQVLNVWGK 943.2 2 YLEFISDAIIHVLHSK YLEFISDAIIHVLHSK correct [276.1]EFLSD[184.12]LHVLHSK

Comparison of PEAKS and Lutefisk

Page 17: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Users

Page 18: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Implementation Particulars

• More accurate scoring:– sum of the logarithmic intensities– many other ion types– coexisting ions, e.g., x2, y2, z2

• Deconvolution– converting multiply-charged peaks to singly-charged

ones

• Recalibration – compress/stretch the spectrum for calibration error

• Noise reduction

Page 19: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Acknowledgement

• Bin Ma, Kaizhong Zhang were supported by NSERC.

• Chengzhi Liang was supported by BSI.

• Thanks the development team in BSI for the software development.

Page 20: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics
Page 21: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Tandem Mass Spectrometer

massanalyzer

fragment

precursor ions fragment ions

MPSER

SG…

+

PAK +

+

P+ AKPAK +

PAK + PA+ K

AK+P

K+PA

P +K+

PA+

AK+

PAK +

PAK +

de novo sequencing

massanalyzer

ionsdetector

Page 22: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Algorithm Sandwich• DP(0,0) = 0; DP(u,v) = -infinity for (u,v)!=(0,0);

• for u from 1 to m/2 do

for v from u-max(a) to u+max(a) do

for a in Σ do

if u<v then

else

• find u,v,a, s.t. u+v+a=m and DP(u,v) maximized;

• backtracking;

),(),,(),(max),( vauDPvufvuDPvauDP

),(),,(),(max),( avuDPvugvuDPavuDP

Page 23: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics
Page 24: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Dynamic Programming

1. for u from 0 to m

2. backtracking

)()(max)( ufauDPuDP a

Page 25: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Dynamic Programming

),(max),(

suffix is prefix, is

)(,)(QRscorevuDP

QR

vQmuRm

•We hope DP(u,v) for u+v=m gives the optimal prefix and suffix. •The optimal solution can be obtained by concatenation of the prefix and suffix.

Page 26: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Chummy Pairs

• Two strings Ra and bQ are called chummy pairs, iff. either of the following two is true:(C1)(C2)

)a(1)b(19)(1 RmQmRm

)b(19)a(1)(19 QmRmQm

(LGE, LVR) (C2)(LGE, VR) (C1)(LGE, R) (C1)(LG,VR) is not chummy

Page 27: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Chummy pairs

• Lemma A – Suppose Ra and bQ are a chummy pair. u=m(Ra), v=m(bQ). If (C1) is true,

If (C2) is true,

) , ( ) a ( ) b a (v u f ,Q R score Q, R score

) , ( ) b ( ) b a (v u g Q R, score Q, R score

Page 28: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics

Chummy Pairs

• Lemma B – Let P be the optimal solution. Then there is a chummy pair (R,Q) and a letter a such that P=RaQ. Also, there is a chummy pair series such that

),(),(),(),( 11 QRQRQR nn