using dsp to find coding regions in dna sequences
DESCRIPTION
Using DSP To Find Coding Regions in DNA Sequences. Anna de Regt and Rio Akasaka. Background. Exons are coding regions of DNA DNA exhibits period-three behavior Frequency Domain methods Sliding window DFT Auto-regressive method Time Domain methods Second-order resonant filter - PowerPoint PPT PresentationTRANSCRIPT
Using DSP To Find Coding Regions in DNA
Sequences
Anna de Regt and Rio Akasaka
Background
• Exons are coding regions of DNA• DNA exhibits period-three behavior
Frequency Domain methods1. Sliding window DFT2. Auto-regressive method
Time Domain methods1. Second-order resonant filter2. Average Magnitude Difference Function (AMDF)
Working with C. elegans
Caenorhabditis elegansSoil nematode of approx. 1mm lengthFirst multicellular organism to haveits genome completely sequenced,first published in 1998.
Our data is from WormBaseF56F11.4a Chromosome III
Did you know? C. elegans made news when it was discovered that specimens had survived the Space Shuttle Columbia disaster in February 2003.]
Shifting Window Method
1. Fourier Transform the base sequences
2. Evaluate over an N-length window and then shift window
3. Take k=N/3
2 2 2 2( ) ( ( )) ( ( )) ( ( )) ( ( ))A C T G A C T GS k U k U k U k U k
0 200 400 600 800 1000 12000
100
200
300
400
500
600shifting filter of DFT
relative base location n
S(N
/3)
Digital Filter Method
• 733 times faster• Apply an antinotch filter at ω=2π/3
x(n) H(z) y(n)
The filter
Notch filter as seen in fdatool
2 10
1 2 20
(1 ) 2 cos( )( )
1 2 cos( )
R R zH z
R z R z
0 1000 2000 3000 4000 5000 6000 70000
200
400
600
800
1000
1200with antinotch filter
relative base location n
y a+c+
g+t
Exploiting Strand Symmetry
C H A R G A F F ’ S R U L E : DNA from any cell of all organisms should
have a 1:1 ratio of pyrimidine and purine bases
i.e. %A+T=%G+G
( ) ( ) ( ) ( ) ( )A C T G A C G Tu n au n cu n gu n tu n
( ) ( ) ( )T G G Tu n gu n tu n
0 1000 2000 3000 4000 5000 6000 70000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08only using G and T information
relative base location n
Yg+
t
Using a Single Filter
• Optimize the parameters t and g in
• Have them add up to 1
( ) ( ) ( )T G G Tu n gu n tu n
0 1000 2000 3000 4000 5000 6000 70000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04optimization with single filter
relative base location n
optim
ized
Yg+
t
Quadratic Window
• Filter the high-frequency components out of each peak
0 1000 2000 3000 4000 5000 6000 70000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04with quadratic window
relative base location
Acknowledgements• Grateful acknowledgement to
Prof. Erik Cheever
1. Fox & Carreira. A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression
2. Vaidyanathan & Yoon. Digital filters for gene prediction applications
3. Vaidyanathan & Yoon. Gene and Exon Prediction using Allpass-based Filters
4. Anastassiou. DSP in Genomics: Processing and Frequency-Domain Analysis of Character Strings
5. Anastassiou. Frequency-domain analysis of biomolecular sequences.
6. Akhtar. Comparison of Gene and Exon Prediction Techniques for Detection of Short Coding Regions.