![Page 1: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/1.jpg)
1
Transcription Factor-DNA binding prediction
Tahmina AhmedProsunjit BiswasIffat Sharmin ChowdhuryBadri Sampath
![Page 2: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/2.jpg)
2
Motivation
• Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems.
![Page 3: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/3.jpg)
3
Approaches
• K-mer based Fixed length K-mer
K-mer with Mismatches
Using Regular Expression
• PWM basedMEME and MAST
• Combined Model
Unite both model
![Page 4: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/4.jpg)
K-mer Approach Based on Regular Expression
Motivation
2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer.
Strategy
- For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X.
- Use these Regular expression as candidate attribute.
![Page 5: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/5.jpg)
5
Classifier Selection
Fig : Around 9 classifiers applied on TF data set
Algorithms are numbered as follows -
(1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48
Summary -
* 9 classifiers are applied on 10 data set. 3 are shown among them
* choosing an absolute classifier is not a trivial task
* same classifier behaves differently on different data sets
![Page 6: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/6.jpg)
6
Change in Accuracy due to Different Classifiers
Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes
Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data set
Summary -
* classifiers have great consequences on accuracy
* one has to be prudent when choosing classifiers
![Page 7: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/7.jpg)
7
Change in Accuracy due to Different K-mer Length
4-mer 5-mer 6-mer
Fig : The performance of different length K-mer on TF_3 data set
Summary -
* K-mer length also has consequences on accuracy
* not trivial, difficult to find the absolute one
![Page 8: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/8.jpg)
8
Attribute Space Selection
Fig : The performance of different selecting k-mer on TF_4 data set
Summary -
* considering number of attributes also has consequences on accuracy
* accuracy increases if we consider greater number of attributes, but from such saturation point it decreases.
![Page 9: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/9.jpg)
9
PWM based Analysis on Accuracy(TF_1 data set)
Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 – maxW 15, no. of motifs 5
Summary -
* accuracy increases when we have more motifs but fixed no. of sites
* accuracy increases when we have more sites but fixed no. of motifs
* what happened when we increases both ?????
![Page 10: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/10.jpg)
PWM based Analysis
Fig : Accuracy vary on no. of motifs and no. of sites
* 1st bar concern with no. of sites
* 2nd bar concern with no. of motifs
* 3rd bar concern with accuracy
* the point is that accuracy decreases when we increases no. of motifs and no. of sites.
![Page 11: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/11.jpg)
Extra Work for TF_20
Fig : Flow diagram of Building New Model for TF-20
Summary -
* we have done some extra work for TF_20
K-mer+
Pwm Sequences identified differently
Sequences identified by both model
Biased 2-mer Model
Newly Labeled
Sequences
The New Model for TF-20
![Page 12: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/12.jpg)
12
AUC based on the Feedback (bonus model)
Fig : AUC of 10 data sets based on last submission
* accuracy improved than first submission
* PWM does not have pleasant result
![Page 13: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/13.jpg)
13
Participation
Background Study
Working with Tools
Working with
Models
Parameter Tuning
Automation
Badri Sampath
DNA,RNA,protein, motif
AlignAce, MEME,MAST
PWM K-mer Arff Writer,Mast output
writer
Iffat Sharmin
Chowdhury
Protein, Motif,
Transcription
Weka, AlignAce,ScanAce
K-mer PWM Script for FASTA,
Weka
Prosunjit Biswas
DNA, Transcriptio
nK-mer
MEME,MAST
K-mer PWM Script for RE, for new
model
Tahmina Ahmed
MEME, MAST, PWM
MEME, MAST,Weka
PWM K-mer Script for MEME, MAST
![Page 14: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/14.jpg)
14
Acknowledgment
![Page 15: Final Project Transciption Factor DNA binding Prediction](https://reader033.vdocument.in/reader033/viewer/2022051609/547b3670b379593a2b8b4cd0/html5/thumbnails/15.jpg)
Questions ???