using dsp to find coding regions in dna sequences

16
Using DSP To Find Coding Regions in DNA Sequences Anna de Regt and Rio Akasaka

Upload: ivo

Post on 20-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Using DSP To Find Coding Regions in DNA Sequences. Anna de Regt and Rio Akasaka. Background. Exons are coding regions of DNA DNA exhibits period-three behavior Frequency Domain methods Sliding window DFT Auto-regressive method Time Domain methods Second-order resonant filter - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using DSP To Find Coding Regions in DNA Sequences

Using DSP To Find Coding Regions in DNA

Sequences

Anna de Regt and Rio Akasaka

Page 2: Using DSP To Find Coding Regions in DNA Sequences

Background

• Exons are coding regions of DNA• DNA exhibits period-three behavior

Frequency Domain methods1. Sliding window DFT2. Auto-regressive method

Time Domain methods1. Second-order resonant filter2. Average Magnitude Difference Function (AMDF)

Page 3: Using DSP To Find Coding Regions in DNA Sequences

Working with C. elegans

Caenorhabditis elegansSoil nematode of approx. 1mm lengthFirst multicellular organism to haveits genome completely sequenced,first published in 1998.

Our data is from WormBaseF56F11.4a Chromosome III

Did you know? C. elegans made news when it was discovered that specimens had survived the Space Shuttle Columbia disaster in February 2003.]

Page 4: Using DSP To Find Coding Regions in DNA Sequences
Page 5: Using DSP To Find Coding Regions in DNA Sequences

Shifting Window Method

1. Fourier Transform the base sequences

2. Evaluate over an N-length window and then shift window

3. Take k=N/3

2 2 2 2( ) ( ( )) ( ( )) ( ( )) ( ( ))A C T G A C T GS k U k U k U k U k

Page 6: Using DSP To Find Coding Regions in DNA Sequences

0 200 400 600 800 1000 12000

100

200

300

400

500

600shifting filter of DFT

relative base location n

S(N

/3)

Page 7: Using DSP To Find Coding Regions in DNA Sequences

Digital Filter Method

• 733 times faster• Apply an antinotch filter at ω=2π/3

x(n) H(z) y(n)

Page 8: Using DSP To Find Coding Regions in DNA Sequences

The filter

Notch filter as seen in fdatool

2 10

1 2 20

(1 ) 2 cos( )( )

1 2 cos( )

R R zH z

R z R z

Page 9: Using DSP To Find Coding Regions in DNA Sequences

0 1000 2000 3000 4000 5000 6000 70000

200

400

600

800

1000

1200with antinotch filter

relative base location n

y a+c+

g+t

Page 10: Using DSP To Find Coding Regions in DNA Sequences

Exploiting Strand Symmetry

C H A R G A F F ’ S R U L E : DNA from any cell of all organisms should

have a 1:1 ratio of pyrimidine and purine bases

i.e. %A+T=%G+G

( ) ( ) ( ) ( ) ( )A C T G A C G Tu n au n cu n gu n tu n

( ) ( ) ( )T G G Tu n gu n tu n

Page 11: Using DSP To Find Coding Regions in DNA Sequences

0 1000 2000 3000 4000 5000 6000 70000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08only using G and T information

relative base location n

Yg+

t

Page 12: Using DSP To Find Coding Regions in DNA Sequences

Using a Single Filter

• Optimize the parameters t and g in

• Have them add up to 1

( ) ( ) ( )T G G Tu n gu n tu n

Page 13: Using DSP To Find Coding Regions in DNA Sequences

0 1000 2000 3000 4000 5000 6000 70000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04optimization with single filter

relative base location n

optim

ized

Yg+

t

Page 14: Using DSP To Find Coding Regions in DNA Sequences

Quadratic Window

• Filter the high-frequency components out of each peak

Page 15: Using DSP To Find Coding Regions in DNA Sequences

0 1000 2000 3000 4000 5000 6000 70000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04with quadratic window

relative base location

Page 16: Using DSP To Find Coding Regions in DNA Sequences

Acknowledgements• Grateful acknowledgement to

Prof. Erik Cheever

1. Fox & Carreira. A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression

2. Vaidyanathan & Yoon. Digital filters for gene prediction applications

3. Vaidyanathan & Yoon. Gene and Exon Prediction using Allpass-based Filters

4. Anastassiou. DSP in Genomics: Processing and Frequency-Domain Analysis of Character Strings

5. Anastassiou. Frequency-domain analysis of biomolecular sequences.

6. Akhtar. Comparison of Gene and Exon Prediction Techniques for Detection of Short Coding Regions.