author: jason weston et., al pans presented by tie wang protein ranking: from local to global...
Post on 20-Dec-2015
214 views
TRANSCRIPT
![Page 1: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/1.jpg)
Author: Jason Weston et., al
PANS
Presented by Tie Wang
Protein Ranking: From Local to global structure in
protein similarity network
![Page 2: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/2.jpg)
Outline
Introduction; Background; Method; Experiment Analysis
![Page 3: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/3.jpg)
Introduction
Pairwise subtle sequence similarities imply structural functional and evolutionary relations among DNA and protein seqences;
Search biosequences from online database is analogous to searching the WWW (search engine search the db for query and return a ranked list);
A protein ranking algorithm is presented for biosequence query;
![Page 4: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/4.jpg)
Early algorithms only focus on pair-wise sequence similarity (SW LA search);
Statistical models use multiple alignments for similarity search (profile based, psi-blast);
Global similarity search can be mapped onto protein similarity network.
Background
![Page 5: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/5.jpg)
How to perform protein ranking?
Underlying idea: Google ranking Key feature: Exploiting global structure by
interring it from local hyperlink structure. Construct a protein similarity network Add query sequence Weight diffusion Rank proteins upon convergence
![Page 6: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/6.jpg)
Algorithm
![Page 7: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/7.jpg)
Experiment
Use protein 3-D structure database SCOP as golden standard.
Sequences have no more than 95% similarity.
7329 proteins are splitted into 379 superfamilies as training and 332 for testing
3 networks are generated using BLAST and PSI-BLAST.
![Page 8: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/8.jpg)
Experiment
Value
Compare with other two experiments: 1. only local structure are considered 2. non-local edges without weak edges The result shows that the second one is only slightly
worse than our algorithm
=
Where Sj(i) is E value assigned to protein I given query j.
![Page 9: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/9.jpg)
Analysis
Bower et al, Science vol 306, 2004
Cluster structure
![Page 10: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/10.jpg)
Author: Kuang Rui et., al
Bioinformatics
Presented by Tie Wang
Motif based protein ranking by network
propagation
![Page 11: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/11.jpg)
Outline
Introduction; Background; Method; Experiment Analysis
![Page 12: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/12.jpg)
Direct measure of pairwise sequence is proved to be effective on classification.
Performance is dropped down when detecting subtle remotely homology sequences.
Those sequences share a conserved structure at least at some components.
Formulate problem based on this statement.
Background
![Page 13: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/13.jpg)
Protein motif bipartite network
• Each protein contains a set of motifs.
• Each motif belongs to a set of proteins.
• Their relationship are mapped to a Bipartite graph as shown on the left.
• The edge weight indicates the probi- lity that motif x is in protein y.
![Page 14: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/14.jpg)
Motifdrop Algorithm
Set P represents protein sequences and set F represents motifs. H is the connectivity matrix.
is row normalized version of H.
is a vector of initial value for H.
is a vector of initial value for P.
![Page 15: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/15.jpg)
MotifProp Algorithm
The convergence of motifdrop is guranteed. The problem is reformulated based on the
following rule,
is row normalized version of H.
is a vector of initial value for H.
is a vector of initial value for P.
![Page 16: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/16.jpg)
Edge weighting scheme
PSI-BLAST E-value is assigned between pair-wise protein nodes.
Gaussian edge weights are calculated.
The Gaussian weights from query to each protein are assigned as initial value.
![Page 17: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/17.jpg)
Value estimation
Sq(i) is the E-value of protein i and query q.
Eq(j) is the E-value of the jth motif and ith protein.
(1)
???
![Page 18: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/18.jpg)
Estimation on substitution score
Substitutions score between a kmer f and sequence x can be estimated as,
where
and
sl is a log value which implied the S score below threshold can be a motif hits against sequence x.
![Page 19: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/19.jpg)
Sequential MotifProp
Empirical experiments suggest that using a weighted linear combination of multiple motifs does not improve the results.
Apply a simple multiple motif sets scheme. Motif nodes F can be divided into n set partition in which F(i) is a set of motif from ith
motif set. F set represents the motifs instead of individual ones.
![Page 20: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/20.jpg)
Motif-rich regions
![Page 21: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/21.jpg)
Experiments
7329 protein domains with known 3D structure on SCOP.
They are divided into training (4246) and testing (3083).
Apply additional 10602 from swiss-prot db. Evaluation on ROC curve.
![Page 22: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/22.jpg)
Results of classification
![Page 23: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/23.jpg)
Results of classification (cont)
![Page 24: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/24.jpg)
Results on Motif rich region
![Page 25: Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d455503460f94a226e2/html5/thumbnails/25.jpg)
Conclusion
Two methods are presented on protein classification using protein ranking methods.
Similarity matrix and protein/motif propagation network are base structures.
Simple methods but innovative formulation. Better results compared with current
approaches. Analysis on results play an important roles.