michael morra cse 4939w detection of transcription factor binding sites

12
MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Upload: adrian-jackson

Post on 17-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

MICHAEL MORRACSE 4939W

Detection of Transcription Factor Binding Sites

Page 2: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Project Recap

Implement a method used to accurately and precisely discover the locations of transcription factor binding sites within a DNA sequence.

4 species (Human, Mouse, Fruit Fly & Yeast) 52 Transcription Factors, 524 binding sites

Image from: http://www.cs.uiuc.edu/homes/sinhas/work.html

Page 3: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Multiple Sequence Alignment

To be able to analyze the data effectively, each transcription factor’s binding sites need to be aligned

ClustalW2>s1GACTTTTCGCT>s2CGATTTTCTCG>s3GCATTTTCCCA>s4AGAGAAAACCC>s5GAATAACCCAAGAGAAA>s6ACAGAAAAATC>s7CGAGAAAATCG>s8TGGTTTTCCCG>s9GGGTTTCTCCC

Page 4: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Scoring

Berg and von Hippel method

l = length of the sequence to be scored j = position in the sequence nj = number of times a base occurs at position j in the

alignment tj = base at position j in the sequence to be scored nj(0) = most common base at position j

Page 5: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Implementation

Microsoft Visual Studio - C++ Input

Multiple Sequence Alignment of a transcription factor’s binding sites (.txt file)

All binding sites of a species (.txt file) Output

Scores Results of Leave One Out Cross Validation

Testing and Efficiency purposes

Page 6: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Implementation

Scoring Algorithm Input: Alignment Function: Create the scoring matrix

Leave One Out Cross Validation Input: Alignment and Binding Sites Function: Test the effectiveness of the scoring matrix

Page 7: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Functionality

Sequence to be scored is shorter than the alignment Slide the sequence over the alignment and take the

highest scoring portionSequence to be scored is longer than the

alignment Slide the alignment over the sequence and take the

highest scoring portion

Page 8: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

TestingScoring Algorithm/LOOCV

Unit testing will be done on each function and critical portions of code as they are implemented

Once it is determined that the code is functioning correctly and all formulas are providing correct results, implementation can continue

Page 9: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

TestingOverall Performance

To determine the effectiveness of the algorithm, a cross validation technique is used

This technique involves leaving one binding site out when the multiple sequence alignment is performed, and then scoring that left out sequence

If the algorithm is effective, the left out sequence should score higher than the majority of other binding sites within that species. (>80-90%)

Page 10: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Progress

Alignments Complete

Scoring Algorithm Mostly Complete

Leave One Out Cross Validation Partially Complete

Page 11: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Remaining Schedule

Nov 15th – Nov 19th Finish implementation and testing of scoring

algorithmNov 20th – 29th

Finish implementation of leave one out algorithm Begin testing of entire program’s effectiveness

Nov 30th – Dec 6th Complete testing Tweak program to run more effectively/accurately

Page 12: MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites

Questions?