signal processing of dna and protein sequences

36
Nitesh Kumar Singh SIGNAL PROCESSING OF PROTEIN SEQUENCES AND DNA

Upload: nitesh-kumar

Post on 25-May-2015

821 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Signal processing of dna and protein sequences

Nitesh Kumar Singh

SIGNAL PROCESSING OF PROTEIN SEQUENCES AND

DNA

Page 2: Signal processing of dna and protein sequences

Signal -Signal is the flow of Information.Mathematically, Signals are the functions of

the independent variable, such as time ( For example speech signal ), or position ( for example image ).

Page 3: Signal processing of dna and protein sequences

Biomedical Signal –

Electrical signals generated in the a biological system (human or animal) or originating from a physiologic process due to electrochemical changes accompanied by the conduction of signals. Examples are EEG, ECG.

Page 4: Signal processing of dna and protein sequences

Signal Processing Methods –

Analog or Continuous Time Signal Processing

Digital or Discrete Time Signal Processing

Page 5: Signal processing of dna and protein sequences

Advantages of DSP over ASP -

Stable, robust, accurate.Flexibility and up-gradation.easily stored.Easy operation in short timeMultiplexing done by Integrated Service

Digital Network (ISDN)

Page 6: Signal processing of dna and protein sequences

DSP In Biomedical Signals -

Processing of biomedical signals in biological as well as synthetic biological world. Signals are then recorded and processed digitally.

Example : EEG, ECG etc.DSP in medical imaging. Example : CT scanner,

ultrasound, endoscopes etc.Manufacturing healthcare instruments. Example :

heart rate meter, aspect bispectral index.For diagnostic purposes, like analyzing the signals of

heartbeat to check the abnormality and so like, the proteins sequences to study the genomic of living beings.

Page 7: Signal processing of dna and protein sequences

Biomedical application domain using DSP -

Information gathering : Measurement of phenomena to understand the biological system.

Diagnosis : Detection of the malfunction, abnormality, pathology.

Monitoring : To obtain periodic or continuous information about the biological system.

Therapy and Control : Modify the behavior of the system and ensure the result.

Evaluation : Objective analysis, i.e. proof of performance, quality control, effect of treatment.

Page 8: Signal processing of dna and protein sequences

Processing of Biomedical Signals -

Transducers

Amplifiers and Filters

Analog to Digital conversion

Filtering to remove artifacts

Detection of events and components

Analysis of events and waves; Feature extraction

Pattern recognition, classification and

diagnostic decisions

Computer aided diagnostic therapy

Biomedical

signals

Sign

al

proc

essi

ng

Signal

processingSignal processing

Signal processing

Signal Data Acquisition

Page 9: Signal processing of dna and protein sequences

IN THE GENOMICS WORLD

DNA and proteins are mathematically represented in ‘character strings’, in which each character is a letter of an alphabet.

For e.g., DNA has alphabet size of 4 and has the letters A, T, C and G.

Protein has alphabet size of 20.

Page 10: Signal processing of dna and protein sequences

REVISING SOME BIOLOGICAL FUNDAMENTALS

DNA :It is made up of many linked smaller

components, called Nucleotides.Each nucleotides is of 4 types, designated by A,

G, T, C with ends either being 3’ or 5’. 3’ end is linked to 5’ and vica-versa for a strong

covalent bond.Always read in a specific direction, from left to

right5’ 3’

Page 11: Signal processing of dna and protein sequences

Cont.

DNA occurs in pair of stands.Each pair being complementary to each other.The nucleotide chains are bonded by hydrogen

bond with

A = T

C GThe 2 stands in a DNA runs opposite to each

other

Page 12: Signal processing of dna and protein sequences
Page 13: Signal processing of dna and protein sequences

CENTRAL DOGMA

Each DNA is made up of 2 types of regions : Genes and intergenic spaces.

Gene contain the information of the proteins.Each gene is responsible for the production of

protein.A gene, further has 2 sub-regions : Introns and

Exons.Genes are first transcribed into single stranded

RNA or mRNA.Introns from RNA are then removed by the

process of splicing.

Page 14: Signal processing of dna and protein sequences

Cont.After splicing, each mRNA is divided into 3

adjacent bases.Each base is called a Codon.

E.g., AGT, AAC, TGC, TAC, etc.A codon identifies an amino acid which defines

a protein.There are about 64 possible codons, but only 20

amino acids.Many codons can define 1 single amino acid

(many-to-one)

Page 15: Signal processing of dna and protein sequences
Page 16: Signal processing of dna and protein sequences

Cont.

The process of conversion of mRNA to protein is called as translation.

Translation is aided by an adopter molecules, called transfer RNA or tRNA.

Page 17: Signal processing of dna and protein sequences
Page 18: Signal processing of dna and protein sequences

DNA SEQUENCES AND DSP

The macromolecular biological sequences corresponding to chains of nucleotides or amino acids is done by considering them to be strings of characters “A,” “T,” “C,” and “G.” In DSP of these sequences, the characters are assigned a numerical values.

Suppose, we assign number a to character ‘A’, t to character ‘T’, c to character ‘C’, and g to character ‘G’ where a, t, c and g are complex numbers.

Page 19: Signal processing of dna and protein sequences

Cont.If, we take ‘ t = a* ’ and ‘ g = c* ’

We can get a complementary DNA sequence by :

We can also obtain a sequences of proteins by assigning numerical values to the amino acids.

Page 20: Signal processing of dna and protein sequences

Indicator SequenceThe indicator sequence of adenine of a DNA

sequence is defined as:

Where , adenine

And, DNA sequenceSimilarly, we can obtain for the rest 3 bases

Page 21: Signal processing of dna and protein sequences

Cont.

The total spectrum of a symbolic sequence is often defined as the squared modulus of the DFT’s of the indicator sequences, that is:

Page 22: Signal processing of dna and protein sequences

Spectral Envelope

Consider the n × 4 matrix,

and the vector of real weights,

The sequence z = uw then corresponds to the mapping of

A a, C c, G g, t T

Page 23: Signal processing of dna and protein sequences

DNA walk

It is a graphical representation of DNA sequence, termed as “fractal landscape” or “DNA walk”.

random walk model, a walker moves either up ( u(i) = +1) or down ( u(i) = −1) one unit length for each step i of the walk.

uncorrelated walk, the direction of each step is independent of the previous steps.

correlated random walk, the direction of each step depends on the history (“memory”) of the walker.

Page 24: Signal processing of dna and protein sequences

Cont.

The DNA walk is defined by the rule that the walker steps up ( u(i) = +1) if a pyrimidine occurs at position a linear distance i along the DNA chain, while the walker steps down ( u(i) = −1) if a purine occurs at position i.

This provides degree of correlation in the base pair sequence, which is directly visualized by calculating the “net displacement” of the walker after number of steps.

Page 25: Signal processing of dna and protein sequences

Gene Prediction

Characteristics of protein coding DNA regions:base sequences in the protein-coding regions of

DNA molecules have a period-3 component because of the codon structure involved in the translation of base sequences into amino acids.

Eg, For eucaryotes (cells with nucleus) this periodicity has mostly been observed within the exons and not within the introns.

Page 26: Signal processing of dna and protein sequences

Cont.

Filtering:

The filtering of the fragment of the DNA sequence is done with the help of IIR Antinotch Filter

Page 27: Signal processing of dna and protein sequences

Cont.

DNA Spectrogram:the appearance of spectrograms provides

significant information about signals.

provide local frequency information for all four bases defined by displaying the resulting three magnitudes by superposition of the corresponding three primary colors

red for x, green for y, blue for z

Page 28: Signal processing of dna and protein sequences

Cont.

Page 29: Signal processing of dna and protein sequences

Cont.

Page 30: Signal processing of dna and protein sequences

Cont.

Identification of protein coding DNA region:First, DFT’s are calculated for different bases by

the formula of

with k = N/3, that:

W=aA+tT+cC+gG.

Page 31: Signal processing of dna and protein sequences

Color coding and color map approach

Since, Number of primary colors is same as the number of the coding reading frames, color-coding scheme is applied. In this,

the value Θ = 0B is assigned to color RED

the value Θ = 120B is assigned to color BLUE

the value Θ = -120B is assigned to color GREEN

Page 32: Signal processing of dna and protein sequences

Cont. In-between values are color-coded in a linear manner in

which the three axes labeled R, G, and B correspond to the primary colors red, green, and blue.

Page 33: Signal processing of dna and protein sequences

Cont.In color map, the intensity is modulated by the square

magnitude multiplied by 700 and clipped to the interval (0, 1).

Page 34: Signal processing of dna and protein sequences

DisadvantagesThe obstacles involved include large amounts of data,

lacking a complete knowledge of the genome length a priori, and recognizing nucleotide symbol identity with complete accuracy.

These impediments are typical of ones encountered in standard telecommunications problems.

Using Fourier transforms for mapping, the mapping may either expose or hide some frequency information.

Furthermore, there might be no biochemical meaning for the ordering and arithmetic structure that result from the symbolic to numeric mapping.

Page 35: Signal processing of dna and protein sequences

Conclusion -Signal processing-based computational and visual tools

are meant to synergistically complement character-string-domain tools that have successfully been used for many years by computer scientists.

The assignment of optimized, complex numerical values to nucleotides and amino acids provides a new computational framework, which may also result in new techniques for the solution of useful problems in bioinformatics, including sequence alignment, macromolecular structure analysis, and phylogeny.

field of computer science, bioinformatics, has emerged, focusing on the use of computers for efficiently deriving, storing, and analyzing these character strings to help solve problems in molecular biology

Page 36: Signal processing of dna and protein sequences

THANK YOU!!