discovering gapped binding sites
DESCRIPTION
Discovering gapped binding sites. Chengwei Lei Dr. Jianhua Ruan University of Texas at San Antonio Department of Computer Science. Outline of Talk. Motif Finding Background Gapped Motif Finding Chen ’ s method SPACE The PSO-motif algorithm Future Work. Introduction/Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/1.jpg)
Discovering gapped binding sites
Chengwei LeiDr. Jianhua Ruan
University of Texas at San AntonioDepartment of Computer Science
![Page 2: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/2.jpg)
Outline of Talk
• Motif Finding Background • Gapped Motif Finding
– Chen’s method– SPACE
• The PSO-motif algorithm• Future Work
![Page 3: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/3.jpg)
Introduction/Motivation
• Introduction: Identification of a transcription factor binding sites is an important aspect of the analysis of genetic regulation. Many programs have been developed for discovering the motif.
• Motivation: The previously algorithms cost too much memory or time to find out the result; my work is trying to find out a new algorithm use less memory and less time to find the motif.
![Page 4: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/4.jpg)
What is motif finding
• Motif finding, the process of discovering a meaningful pattern (of nucleotides or amino acids) that is shared by two or more sequences, is an important part of the study of gene function.
![Page 5: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/5.jpg)
Cells respond to environment
Heat
FoodSupply
Responds toenvironmentalconditions
Various external messages
![Page 6: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/6.jpg)
Regulation of Genes
GenePromoter
RNA polymerase(Protein)
Transcription Factor (TF)(Protein)
DNA
![Page 7: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/7.jpg)
Regulation of Genes
GeneRegulatory Element, TF binding site, TF binding motif, cis-regulatory motif (element)
RNA polymerase(Protein)
Transcription Factor (TF)(Protein)
DNA
![Page 8: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/8.jpg)
Regulation of Genes
Gene
RNA polymerase
Transcription Factor(Protein)
Regulatory Element
DNA
![Page 9: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/9.jpg)
Regulation of Genes
Gene
RNA polymerase
Transcription Factor
Regulatory Element
DNA
New protein
![Page 10: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/10.jpg)
Real example
.
.
.
![Page 11: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/11.jpg)
Real example
.
.
.
![Page 12: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/12.jpg)
Look Like
• I need a refrigerator, so I go to a refrigerator shop, I try to pick a very beautiful refrigerator from a lot of refrigerator(s). Finally I decide that I will buy a GE refrigerator.
![Page 13: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/13.jpg)
Look Like
• I need a refrigeretor, so I go to a rafrigerator shop, I try to pick a very beautiful refragerator from a lot of refrigerater(s). Finally I decide that I will buy a GE refrigarator.
![Page 14: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/14.jpg)
Mismatch
…TACGAT……TAAAAT……TATACT……GATAAT……TATAAT……TATGTT…
.
.
.
![Page 15: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/15.jpg)
Real example
• …TACGAT…• …TAAAAT…• …TATACT…• …GATAAT…• …TATAAT…• …TATGTT…
Consensus: TATAAT
•refrigeretor•rafrigerator •refragerator •refrigerater •refrigarator.
refrigerator
![Page 16: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/16.jpg)
Gapped Motif
Gene
RNA polymerase
Transcription Factor
Regulatory Element
DNA
New protein
![Page 17: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/17.jpg)
Gapped DNA binding?
![Page 18: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/18.jpg)
Gapped Motif
• Together
• Separate
![Page 19: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/19.jpg)
Together
• Red+blue+green=5/25+15/15+5/25 = 25/65
• Red+xxx+green=5/25+xxx+5/25 = 10/50
mutationsn = 5
L
5+3+5
![Page 20: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/20.jpg)
Separate
• Red=5/25• Green=5/25• Pink=4/25
mutationsn = 5
L
![Page 21: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/21.jpg)
![Page 22: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/22.jpg)
What can we do with the gap?
• Chen’s method
• SPACE
• PSO
![Page 23: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/23.jpg)
Chen’s method
• ChIP-chip experiment – Get a positive set Ga
– Get a negative set G-a
![Page 24: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/24.jpg)
Compact Blocks
• Patterns that are found in Ga with a proportion larger than a predefined value (25% by default) are included in the pattern list.
![Page 25: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/25.jpg)
![Page 26: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/26.jpg)
Compact Blocks
• Long enough patterns (3containing at least six
nonwildcards) are taken as candidate motifs. Short patterns (2blocks of 3 or 4 bp) are filtered
![Page 27: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/27.jpg)
![Page 28: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/28.jpg)
Hit/Seq ratio
• The sequences that match the pattern are called the supporting sequences of a pattern. It is possible that a pattern matches a sequence at more than one position.
• The Hit/Seq ratio of a pattern is the average number of occurrences of a pattern among its supporting sequences.
![Page 29: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/29.jpg)
Block Filtering
• Filtered out if the Hit/Seq ratio is larger than 15
• A large Hit/Seq ratio implies that the compact blocks are frequently repeated in a single promoter region.
![Page 30: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/30.jpg)
• In addition to the Hit/Seq ratio, they also use an upper threshold for f-a (the proportion of sequences with a pattern P in G-a) to eliminate repetitive elements present across different promoter sequences. A pattern is retained only if it satisfies: (less than 0.16)
![Page 31: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/31.jpg)
Growing Gapped Motifs
• Growing gapped motifs is similar to growing compact motifs.
![Page 32: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/32.jpg)
Pattern Ranking
• An identified pattern is filtered out before ranking if the Hit/Seq ratio is2, which is considered as a reasonable upper bound for selecting reliable patterns.
![Page 33: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/33.jpg)
• Sd is the preferential occurrence of a pattern in Ga relative to G-a
• Sp is a formula value.• Sc is the conservation score.
![Page 34: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/34.jpg)
Sd
• The proportions of sequences in Ga and G-a that contain a pattern P are denoted as fa and f-a. The one-tailed two-sample proportion test can be performed as follows:
• Patterns with a z score (Sd) smaller than z1–0.01 are treated as nonsignificant and are removed before the ranking process.
![Page 35: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/35.jpg)
Sp
![Page 36: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/36.jpg)
Sc
• Sc is the degree of evolutionary conservation among a set of orthologous sequences.
• (from Saccharomyces paradoxus, Saccharomyces kudriavzevii, Saccharomyces mikatae, and Saccharomyces bayanus)
![Page 37: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/37.jpg)
Result
![Page 38: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/38.jpg)
![Page 39: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/39.jpg)
Key point
• Filter !!
![Page 40: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/40.jpg)
SPACE
• Generation of motif candidates– Consider L=20
![Page 41: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/41.jpg)
• Consider L=20, r=0.5, l=5, d=1 and q=4.
![Page 42: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/42.jpg)
Refinding Motif
• GAAGAnnnnnnnTAGAAAnn is a spaced motif of five sequences.
![Page 43: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/43.jpg)
• Motif Score(M) =
• +
• E(M, e) be the expected frequency of M with at most e mutations based on a set of background sequences
![Page 44: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/44.jpg)
Why PSO methodBackground• Particle swarm optimization (PSO) is a population based
stochastic optimization technique and it is inspired by social behavior of bird flocking or fish schooling.
• PSO shares many similarities with evolutionary computation techniques such as Genetic Algorithms (GA). But it is simpler and faster than GA.
• It has been shown to be effective in optimizing difficult multidimensional problems in a variety of fields.
• PSO has widely application in ANN (Artificial Neural Network), Nonlinear Control, Electromagnetic, Antenna design, Bioinformatics.
![Page 45: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/45.jpg)
Some key terms used to describe PSO
Agent (Particle)
One single individual in the swarm
Position An agent’s N-dimensional coordinates which represents a solution to the problem
Swarm The entire collection of agents.
Fitness A single number representing the goodness of a given solution
Pbest The location in parameter space of the best fitness returned for a specific agent
Gbest The location in parameter space of the best fitness returned for the entire swarm
V The velocity of each agent.
![Page 46: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/46.jpg)
gbest
Pbest1
Pbest2
n n nx x V
1 , 2 ,() ( ) () ( )n n best n n best n nV V C rand p x C rand g x
![Page 47: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/47.jpg)
• One agent’s movement in the PSO algorithm.
![Page 48: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/48.jpg)
Flow chart of the PSO algorithm
![Page 49: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/49.jpg)
• In a typical PSO algorithm, one wishes to control the velocity so that at the beginning stage the particles can fly around quickly inside the search space, and when a particle approaches the optimal solution, it should slow down so it can converge quickly.
![Page 50: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/50.jpg)
.
.
.• …TACGATA…• …TAAAAT…• …TATACT…• …GATAAT…• …TATGAT…• …TATGTT…
• One can achieve this if the fitness function is continuous, since the velocity is updated according to the distances between the current position and the positions of pbest and gbest.
![Page 51: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/51.jpg)
How to solve
• Remap
• Redefine
![Page 52: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/52.jpg)
Remap the neighborhood information
1
2 N
A C G T T C C A T.............A C G T T C C T mis is 6
mis is 1
![Page 53: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/53.jpg)
Redefine
• Green Current • Red Gbest• Pink Pbest• Blue Random
n = 5
L
![Page 54: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/54.jpg)
Redefine
• Good for gapped motif finding.– Quick– Flexible– High sensitivity– High extensibility
![Page 55: Discovering gapped binding sites](https://reader036.vdocument.in/reader036/viewer/2022062304/56812fee550346895d956566/html5/thumbnails/55.jpg)
Thank you !