location of exons in dna sequences using digital filters

13
Location of Exons in DNA Location of Exons in DNA Sequences Sequences Using Digital Filters Using Digital Filters Parameswaran Ramachandran, Wu-Sheng Lu, and Andreas Antoniou Parameswaran Ramachandran, Wu-Sheng Lu, and Andreas Antoniou Department of Electrical Engineering, University of Victoria, BC, Canada. ISCAS, Taipei ISCAS, Taipei May 27, 2009 May 27, 2009

Upload: lowri

Post on 19-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Location of Exons in DNA Sequences Using Digital Filters. Parameswaran Ramachandran, Wu-Sheng Lu, and Andreas Antoniou. ISCAS, Taipei May 27, 2009. Department of Electrical Engineering, University of Victoria, BC, Canada. DNA. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Location of Exons in DNA Sequences  Using Digital Filters

Location of Exons in DNA Location of Exons in DNA

Sequences Sequences

Using Digital FiltersUsing Digital Filters

Parameswaran Ramachandran, Wu-Sheng Lu, and Andreas AntoniouParameswaran Ramachandran, Wu-Sheng Lu, and Andreas Antoniou

Department of Electrical Engineering,University of Victoria, BC, Canada.

ISCAS, TaipeiISCAS, TaipeiMay 27, 2009May 27, 2009

Page 2: Location of Exons in DNA Sequences  Using Digital Filters

22

DNADNA The instructions to build and maintain a living The instructions to build and maintain a living

organism are encoded in its organism are encoded in its genome genome which is which is made up ofmade up of DNA.DNA.

DNA is composed of smaller components called DNA is composed of smaller components called nucleotides.nucleotides. There are four types of nucleotides There are four types of nucleotides denoted by the letters denoted by the letters A, T, G,A, T, G, and and C.C.

DNA comprises a pair of strands. Nucleotides pair DNA comprises a pair of strands. Nucleotides pair up across the two strands. up across the two strands. AA always pairs with always pairs with TT and and GG always pairs with always pairs with C;C; in effect, the two in effect, the two strands are strands are complementary.complementary.

Page 3: Location of Exons in DNA Sequences  Using Digital Filters

33

Genes and ExonsGenes and Exons

Regions in a genome that code for proteins are Regions in a genome that code for proteins are called genes. Genes are further split into coding called genes. Genes are further split into coding regions called regions called exonsexons and noncoding regions called and noncoding regions called introns. introns.

Accurate location of exons in genomes is very Accurate location of exons in genomes is very important for understanding life processes.important for understanding life processes.

Organization of genes.

Page 4: Location of Exons in DNA Sequences  Using Digital Filters

44

Location of ExonsLocation of Exons

The power spectra of DNA segments corresponding The power spectra of DNA segments corresponding to exons exhibit a relatively strong component atto exons exhibit a relatively strong component atThis is known as the This is known as the period-3 property.period-3 property.

Thus, exons can be located by mapping the DNA Thus, exons can be located by mapping the DNA characters into numbers and then tracking the characters into numbers and then tracking the strength of the period-3 component along the strength of the period-3 component along the length of the DNA sequence of interest.length of the DNA sequence of interest.

Electron-ion interaction potential (EIIP) values have Electron-ion interaction potential (EIIP) values have been used by us earlier for the location of hot been used by us earlier for the location of hot spots in proteins. A narrowband bandpass digital spots in proteins. A narrowband bandpass digital filter was used to select the characteristic filter was used to select the characteristic frequency.frequency.

Here, we apply the filtering technique for the Here, we apply the filtering technique for the location of exons in DNA sequences.location of exons in DNA sequences.

2π/3.

Page 5: Location of Exons in DNA Sequences  Using Digital Filters

55

Filter-Based Exon Location TechniqueFilter-Based Exon Location Technique

1.1. The DNA character sequence of interest is mapped The DNA character sequence of interest is mapped onto a numerical sequence using EIIP values.onto a numerical sequence using EIIP values. The EIIP of a nucleotide is a physical quantity denoting the The EIIP of a nucleotide is a physical quantity denoting the

average energy of valence electrons in the nucleotide.average energy of valence electrons in the nucleotide. The EIIP sequence is a weighted sum of four indicator The EIIP sequence is a weighted sum of four indicator

sequences and can be represented bysequences and can be represented by

EIIP A A T T G G C Cw w w w x x x x x

NucleotideNucleotide EIIPEIIP

Adenine (A)Adenine (A) 0.12600.1260

Thymine (T)Thymine (T) 0.13350.1335

Guanine (G)Guanine (G) 0.08060.0806

Cytosine (C)Cytosine (C) 0.13400.1340

( )Aw

( )Tw

( )Gw

( )Cw

EIIP Values for the Nucleotides

Page 6: Location of Exons in DNA Sequences  Using Digital Filters

66Filter-Based Technique (cont’d)Filter-Based Technique (cont’d)2.2. A narrowband bandpass digital filter with its A narrowband bandpass digital filter with its

passband centered at the period-3 frequency is used passband centered at the period-3 frequency is used to filter the DNA sequence.to filter the DNA sequence.

3.3. The filtered output is an amplitude modulated signal The filtered output is an amplitude modulated signal as shown below.as shown below.

Plot of for gene AF039307.

[ ]y n

Page 7: Location of Exons in DNA Sequences  Using Digital Filters

77Filter-Based Technique (cont’d)Filter-Based Technique (cont’d)

4.4. The power of the output signal, , is filtered The power of the output signal, , is filtered using a lowpass filter to identify the exon locations as using a lowpass filter to identify the exon locations as distinct peaks.distinct peaks.

To eliminate phase distortion, zero-phase filtering is To eliminate phase distortion, zero-phase filtering is employed for both the filtering procedures.employed for both the filtering procedures.

A schematic of the complete system is shown below.A schematic of the complete system is shown below.

2( [ ])y n

The complete exon location system.

Page 8: Location of Exons in DNA Sequences  Using Digital Filters

88

ResultsResults

To evaluate the performance of the proposed To evaluate the performance of the proposed technique, we applied it to a set of five genes whose technique, we applied it to a set of five genes whose sequences and true exon locations were downloaded sequences and true exon locations were downloaded from the well-known HMR195 dataset.from the well-known HMR195 dataset.

We used an inverse-Chebyshev bandpass filter of We used an inverse-Chebyshev bandpass filter of order 6 followed by an inverse-Chebyshev lowpass order 6 followed by an inverse-Chebyshev lowpass filter of order 14.filter of order 14.

The exon locations were identified by our technique The exon locations were identified by our technique for all the five genes.for all the five genes.

For comparison, we also implemented the STDFT For comparison, we also implemented the STDFT technique employed by Nair and Sreenadhan (2006).technique employed by Nair and Sreenadhan (2006).

Page 9: Location of Exons in DNA Sequences  Using Digital Filters

99

Results (cont’d)Results (cont’d)

Exon locations for gene AF039307predicted by the filter-based technique.

Exon locations for gene AF039307predicted by the STDFT-based technique.

Note that for the exon near location 4000, the Note that for the exon near location 4000, the filter-based technique exhibits a well-defined peak, filter-based technique exhibits a well-defined peak, while the STDFT technique does not.while the STDFT technique does not.

Page 10: Location of Exons in DNA Sequences  Using Digital Filters

1010

Results (cont’d)Results (cont’d)

Exon locations for gene AF009614predicted by the filter-based technique.

Exon locations for gene AF009614predicted by the STDFT-based technique.

Note that for the exon near location 4000, the Note that for the exon near location 4000, the filter-based technique exhibits a single well-filter-based technique exhibits a single well-defined peak, while the STDFT technique exhibits defined peak, while the STDFT technique exhibits two peaks leading to an ambiguity.two peaks leading to an ambiguity.

Page 11: Location of Exons in DNA Sequences  Using Digital Filters

1111Results (cont’d)Results (cont’d)

To compare the computational efficiencies of the two To compare the computational efficiencies of the two techniques, we computed the average CPU times over 1000 techniques, we computed the average CPU times over 1000 runs of the techniques for the five genes. runs of the techniques for the five genes.

Results show that the filter-based technique requires only Results show that the filter-based technique requires only about 3% of the computational load required by the STDFT-about 3% of the computational load required by the STDFT-based technique.based technique.

Page 12: Location of Exons in DNA Sequences  Using Digital Filters

1212

ConclusionsConclusions

A filtering technique for the location of hot spots in A filtering technique for the location of hot spots in proteins reported by us earlier was applied for the proteins reported by us earlier was applied for the location of exons in DNA sequences.location of exons in DNA sequences.

Results show that the proposed technique is both Results show that the proposed technique is both more accurate and computationally much more more accurate and computationally much more efficient than another computational STDFT-based efficient than another computational STDFT-based technique.technique.

Drawing from the successful application of EIIP Drawing from the successful application of EIIP values for protein analysis, the application of EIIP values for protein analysis, the application of EIIP values for DNA analysis may lead to improved values for DNA analysis may lead to improved modeling of the interrelations between DNA and modeling of the interrelations between DNA and proteins.proteins.

Page 13: Location of Exons in DNA Sequences  Using Digital Filters

1313

ReferencesReferences

1. S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy, “Prediction of probable genes by fourier analysis of genomic sequences,” Computer Applications in the Biosciences (CABIOS), vol. 13, no. 3, pp. 263–270, 1997.

2. D. Anastassiou, “Genomic signal processing,” IEEE Signal Processing Magazine, pp. 8–20, Jul. 2001.

3. A. S. Nair and S. P. Sreenadhan, “A coding measure scheme employing electron-ion interaction pseudopotential (EIIP),” Bioinformation, vol. 1, no. 6, pp. 197–202, 2006.

4. S. Rogic, A. K. Mackworth, and F. B. F. Ouellette, “Evaluation of genefinding programs on mammalian sequences,” Genome Research, vol. 11, no. 5, pp. 817–832, May 2001.

5. P. Ramachandran, W.-S. Lu, and A. Antoniou, “Improved hot-spot location technique for proteins using a bandpass notch digital filter,” in IEEE International Symposium on Circuits and Systems, Seattle, May 2008, pp. 2673–2676.