bi0500- applications of matlab in … applications of matlab in bioinformatics . i m.tech...
TRANSCRIPT
BI0500- APPLICATIONS OF MATLAB IN BIOINFORMATICS I M.TECH BIOINFORMATICS SEMESTER –II YEAR: 2010-2011
NAME :
REG No:
SCHOOL OF BIOENGNIEERING
DEPARTMENT OF BIOINFORMATICS
SRM UNIVERSITY (Under section 3 of UGC Act, 1956)
S.R.M NAGAR, KATTANKULATHUR-603 203
KANCHEEPURAM DISTRICT
SRM UNIVERSITY (Under Section 3 of UGC Act, 1956 )
SRM NAGAR, KATTANKULATHUR – 603 203 KANCHEEPURAM DISTRICT
Bonafide certification will be available at the library. Collect it and put it here.
ONAFIDE CERTIFICATEB
Certified to be the bonafide record of work done by
Of
course in the practical
In SRM UNIVERSITY KATTANKULATHUR
uring the academic year .
Lab Incharge
te: Head of the Department
at RM UNIVERSITY KATTANKULATHUR.
ate : Internal Examiner External Examiner
NDE
Register No M.Tech. Degree D Da Submitted for University Examination held inS D
I X
Sl.no ate ame of the experiment
age no Sign
D
N
P
1.
BASIC MATLAB OPERATIONS
2.
OLVING DIFFERENTIAL EQUATION USING B
SMATLA
3.
USING M-FILES AND OPERATIONS OF M-FILES
4.
EQUENCE AND STRUCTURAL ANALYSIS
S
5.
MODELING OF BIOLOGICAL SYSTEMS
6.
SIMULATIONS AND ANALYSIS OF BIOLOGICAL SYSTEMS.
Expt No.1: BASIC MATLAB OPERATIONS
ate:
bout matlab. mention the importance of array and matrix
lements separated by ;) are matrix
3;4 5 6; 7 8 9 0] –not a square matrix and hence Error
METIC OPERATIONS:
5 6;4 5 6 7] for the secondary diagonal use sum(diag(fliplr(C)))=16
ERATOR:
all elements in last coloumn
stic matrix. Powers of this matrix yield rkovian processes ced echelon form of A
PO
D AIM: To understand basic operations in MATLAB. INTRODUCTION:
rite some information aWoperations in the proceedure CREATING A MATRIX: >> A=[1 2 3 4] --Row matrix >> B=[1;2;3;4] --column matrix (e
> C=[1 2 3 4;2 3 4 5;3 4 5 6] –squ>>> D=[1;2
ARITH >> sum(A) >> A' >> trace(C)
> C=[1 2 3 4;2 3 4 5;3 4 >>> trace(C) COLON OP >> 1:1:10 >> 10:-1:2 >> C(1:3,2)
> sum(C(:,end))—sum of>>> A=C(:,[4 1]) –extract elements in 4 and 1
SPECIAL MATRICES:
1. zeros(2,2)—creates a matrix with zeros 2. ones(1,3)—creates a matrix with ones 3. pascal(3)—creates a symmetric matrix 4. I=eye(3)—creates an identity matrix 5. magic(4)—sum of elements along cols and rows are same 6. poly(A)—returns the characteristic polynomial 7. inv(A)—returns the inverse of the matrix 8. magic(4)/ sum(magic(4))—doubly stocha
transition probabilities in Ma9. rref(A)—returns the row-redu10. rank(A)—returns the rank of the matrix A
LYNOMIAL OPERATIONS:
>v1=[1 -2 1] --Represent a polynomial
)
>p=poly(r)—Coefficient of the polynomial, given the roots r in a matrix
>v2=[1 -1]
of the derivative
Q,R]=deconv(B,A)—Division of two polynomials; B = conv(A,Q)+R.
--n is the degree of Polynomial and x and y are the vectors.
>>y
d draw appropriate graphs in
options
l(), ylabel(), title(). ‘FontSize’, 20)
• djusting line width ) command
• Area(x,y)—area plot of x and y
RAPHICS:
ary and complex data
• Axis command • Meshgrid () • Surf function • Colormap function
>>>polyval(v1,2)=1 –calculates the value of the polynomial at x=2 >>r=roots(v1 > >>>c=conv(v1,v2)—product of two polynomials
>>polyder(v1,v2)—product >>[
Q—Quotient, R—Remainder
>>p=polyfit(x, y, n);
2=polyval(p,x2)—returns the value of the polynomial using the value in the x2 vector
BASIC GRAPHICS:
>> plottools –a command used to open the palette anthem • Plot command and its• Plotyy command • Annotating a picture using xlabe• Title(‘title’,
A• Linspace(-pi, pi, 50
• Legend(‘’)
ADVANCED G
• Graphing imagin• Axis equal • Hold on • Subplot(m, n, p)
Expt No.2: SOLVING DIFFERENTIAL EQUATION USING B
te:
IM: To understand the use of MATLAB in solving simple and complex Differential
solvers
amples
An ODE represents a relationship between a function and its derivatives. One such relation
linear ordinary ifferential equation
MATLADa
A
equations.
INTRODUCTION:
Ordinary differential equations (ODEs) are used throughout engineering,
mathematics, and science to describe how physical quantities change and it is a standard part
of the solution in these fields. Matlab is used to solve nearly all these problems because it is a
very convenient and widely used problem-solving environment (PSE) with quality
that are exceptionally easy to use. It is also such a high-level programming language that
programs are short, making it practical to list complete programs for all the ex
taken up early in calculus courses is the d
Often solutions are specified by means of an initial value. For example, there is a
unique solution of the ODE (1.1) for which y(0) = 1, namely y(t) = et . This is an example of
generally have one and only one solution.
E is specified by conditions at both ends of the interval ch as
an initial value problem (IVP) for an ODE. Like this example, the IVPs that arise in practice
When a solution of this ODsu
We make use of BVP (boundary value problem). BVP always has the trivial solution
y(x) ≡ 0.The BVP that arise in practice may have no solution, a unique solution, or more than
ne solution. If there is more than one solution, there may be a finite number or an infinite
umber of them.
o
n
Solving linear differential equation:
. >> y=dsolve('Dy=y^2,y(5)=22')
- 111/22)
(tan(pi/4 - t/4)^2 + 1)^2
))/(tan(t/4 - pi/4)^2 + 1)^2
pi/4 - t/4)*(tan(- pi/4 - t/4)^2 - 1))/(tan(- pi/4 - t/4)^2 + 1)^2
erential equation:
x)^2=1,x(0)=0')
1/exp(t) - 1
)^2=1,x(0)=0','u')
1/exp(u) - 1
e('D3u=u','u(0)=1','Du(0)=-1','D2u(0) = pi','x')
=
/2) (3^(1/2)*sin((3^(1/2)*x)/2)*(pi +
*exp(x/2))
4. >> y = dsolve('2*x^2*D2y + 3*x*Dy - y = 0','x')
=
C6/(3*x) + C7*x^(1/2)
1
y =
-1/(t
2. >> y=dsolve('(Dy)^2+y^2=1,y(0)=0')
y =
(4*tan(t/4)*(tan(t/4)^2 - 1))/(tan(t/4)^2 + 1)^2
-(4*tan(t/4)*(tan(t/4)^2 - 1))/(tan(t/4)^2 + 1)^2
(4*tan(pi/4 + t/4)*(tan(pi/4 + t/4)^2 - 1))/(tan(pi/4 + t/4)^2 + 1)^2
(4*tan(pi/4 - t/4)*(tan(pi/4 - t/4)^2 - 1))/
(4*tan(t/4 - pi/4)*(tan(t/4 - pi/4)^2 - 1
(4*tan(-
Solving non-linear diff
1. >> x=dsolve('(Dx+
x =
1 - 1/exp(t)
2. >> x=dsolve('(Dx+x
x =
1 - 1/exp(u)
3. >> u = dsolv
u
(pi*exp(x))/3 - (cos((3^(1/2)*x)/2)*(pi/3 - 1))/exp(x
1))/(3
y
5. >> y=dsolve('D2y=x*y','x')
y =
C10*airyBi(x, 0) + C9*airyAi(x, 0)
f+4*g','Dg=-4*f+3*g')
g: [1x1 sym]
7. >> x=dsolve('Dx=x*(k1-k2)-k3*c1')
x =
(c1*k3 - C14*exp(t*(k1 - k2)))/(k1 - k2)
6. >> a=dsolve('Df=3*
a =
f: [1x1 sym]
Expt No.-3: Using M-Files and operations of M-files
script file consists of a sequence of normal MATLAB statements .M –File Scripts are the file because they have no input or output arguments. They are useful for
ATLAB commands, such as computations that you have to perform peatedly from the command line.
>>edit bioinfo.m Then a window opens.
Date: AIM: To study about the usage of M-Files and the operations of M-files INTRODUCTION: Asimplest kind of M-automating series of Mre
On pressing yes the editor opens
Enter on the editor: a =5 b=3 Save this in the matlab directory. Then on the command line type: >>bioinfo.m >>bioinfo.m a = 5 b = 2 To change the path to a new directory: >>addpath('D:\') Basic Parts of an M-File: This simple function shows the basic parts of an M-file. Line that begins with % is not executable:
function f = fact(n) % Compute a factorial value. % FACT(N) returns the factorial of N, % usually denoted by N!
% Put simply, FACT(N) is PROD(1:N). f = prod(1:n);
Save it and go for RUN option in the Editor: >>n=3 >>bioinfo(n) >>what M-files in the current directory C:\Users\Documents\MATLAB bioinfo >>dir . .. bioinfo.asvbioinfo.m >>type bioinfo (The entire contents of the bioinfo file is shown) function f = fact(n) % Compute a factorial value. % FACT(N) returns the factorial of N, % usually denoted by N! % Put simply. f = prod(1:N);
P-code is used to create Protected file. The following program has to be entered in the editor : % An M -file script to produce % Comment lines % "flower petal" plots theta = -pi:0.01:pi; % Computations rho(1,:) = 2 * sin(5 * theta) .^ 2; rho(2,:) = cos(10 * theta) .^ 3; rho(3,:) = sin(theta) .^ 2; rho(4,:) = 5 * cos(3.5 * theta) .^ 3; for k = 1:4 polar(theta, rho(k,:)) % Graphics output pause end This is saved and run.
A new window opens as follows:
On pressing ENTER in the command line the diagram gets changed.
Expt No.-4: SEQUENCE AND STRUCTURAL ANALYSIS Date:
AIM:
The aim the experiment is to analyse the sequence alignment and structure of protein
in Matlab.
INTRODUCTION:
SEQUENCE ALIGNMENT
In bioinformatics, a sequence alignment is a way of arranging the sequences of
DNA, RNA, or protein to identify regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships between the sequences. Aligned
sequences of nucleotide or amino acid residues are typically represented as rows within a
matrix. Gaps are inserted between the residues so that identical or similar characters are
aligned in successive columns.
GLOBAL AND LOCAL ALIGNMENTS
Global alignments, which attempt to align every residue in every sequence, are most
useful when the sequences in the query set are similar and of roughly equal size. (This does
not mean global alignments cannot end in gaps.) A general global alignment technique is the
Needleman-Wunsch algorithm, which is based on dynamic programming. Local alignments
are more useful for dissimilar sequences that are suspected to contain regions of similarity or
similar sequence motifs within their larger sequence context. The Smith-Waterman algorithm
is a general local alignment method also based on dynamic programming. With sufficiently
similar sequences, there is no difference between local and global alignments.
PAIRWISE ALIGNMENT:
Pairwise sequence alignment methods are used to find the best-matching piecewise
(local) or global alignments of two query sequences.
Pairwise alignments can only be used between two sequences at a time, but they are efficient
to calculate and are often used for methods that do not require extreme precision (such as
searching a database for sequences with high similarity to a query). The three primary
methods of producing pairwise alignments are dot-matrix methods, dynamic programming,
and word methods. However, multiple sequence alignment techniques can also align pairs of
sequences. Although each method has its individual strengths and weaknesses, all three
pairwise methods have difficulty with highly repetitive sequences of low information content
- especially where the number of repetitions differ in the two sequences to be aligned. One
way of quantifying the utility of a given pairwise alignment is the 'maximum unique match',
or the longest subsequence that occurs in both query sequence. Longer MUM sequences
typically reflect closer relatedness.
MULTIPLE SEQUENCE ALIGNMENT:
Multiple sequence alignment is an extension of pairwise alignment to incorporate
more than two sequences at a time. Multiple alignment methods try to align all of the
sequences in a given query set. Multiple alignments are often used in identifying conserved
sequence regions across a group of sequences hypothesized to be evolutionarily related. Such
conserved sequence motifs can be used in conjunction with structural and mechanistic
information to locate the catalytic active sites of enzymes. Alignments are also used to aid in
establishing evolutionary relationships by constructing phylogenetic trees. The utility of these
alignments in bioinformatics has led to the development of a variety of methods suitable for
aligning three or more sequences.
PROTOCOL:
SEQUENCE ANALYSIS:
A.Retrieving Sequence Information from a Public Database
1. Open the MATLAB Help browser to the NCBI Web site. In the MATLAB Command
Widow, type
web('http://www.ncbi.nlm.nih.gov/')
2. Search for the gene you are interested in studying. for example hexosaminidaseA enzyme
(HEXA). The NCBI reference for the human gene HEXA has accession number NM_000520
Getting the HEXA sequence data into the MATLAB environment, by typing
humanHEXA = getgenbank('NM_000520')
Getting sequence information for the mouse gene into the MATLAB environment by typing
mouseHEXA = getgenbank('AK080777')
B.Locating Protein Coding Sequences
Looking for open reading frames in the human and mouse HEXA gene.
humanORFs=seqshoworfs(humanHEXA.Sequence)
mouseORFs = seqshoworfs(mouseHEXA.Sequence)
C. To perform a dot plot to compare two protein sequences
“nt2aa”is a command used do is to convert the nucleotide sequences into the corresponding
amino acid sequences
humanProtein = nt2aa(humanHEXA.Sequence);
mouseProtein = nt2aa(mouseHEXA.Sequence);
seqdotplot(humanProtein,mouseProtein)
ylabel('Human hexosaminidase A');
xlabel('Mouse hexosaminidase A');
D. To perform a pairwise sequence alignment
Global Alignment
[score, globalAlignment] = nwalign(humanProtein,mouseProtein);
showalignment(globalAlignment);
identity between the two sequences is 64.5 %. refinement step - to include only the amino acids that are in the protein to get a better alignment
humanStops = find(humanProtein == '*')
mouseStops = find(mouseProtein =='*')
humanSeq = humanProtein(1:humanStops(2));
humanSeqFormatted = seqdisp(humanSeq)
mouseSeq = mouseProtein(1:mouseStops(1));
mouseSeqFormatted = seqdisp(mouseSeq)
humanSeq = humanProtein(1:humanStops(2));
humanSeqFormatted = seqdisp(humanSeq)
mouseSeq = mouseProtein(1:mouseStops(1));
mouseSeqFormatted = seqdisp(mouseSeq)
[score, alignment] = nwalign(humanSeq,mouseSeq);
showalignment(alignment);
identity between the two sequences is 76 %. This is still not a perfect alignment at the beginning of the sequence. In order to align the sequences starting with the Met1 we can go back to the information about the ORFs in the nucleotide sequences.
humanPORF =
nt2aa(humanHEXA.Sequence(humanORFs(1).Start(1):humanORFs(1).Stop(1)));
mousePORF =
nt2aa(mouseHEXA.Sequence(mouseORFs(1).Start(1):mouseORFs(1).Stop(1)));
[score, ORFAlignment] = nwalign(humanPORF,mousePORF);
showalignment(ORFAlignment);
identity between the two sequences is 84 %.
Local Alignment:
Local alignment is the better sequence alignment than Global alignment. The function
“swalign“performs local alignment using the Smith-Waterman algorithm. This shows a very
good alignment for the whole coding region of the gene.
[score, localAlignment] = swalign(humanProtein,mouseProtein);
showalignment(localAlignment);
identity between the two sequences is 83%. Calculating Sequence Statistics:
1. Basecount is used to count the number of bases of the whole sequence
bases = basecount(humanHEXA) bases = A: 507 C: 460 G: 541 T: 554
2. codoncount is used to count the number of codons in the given sequence
codonHEXA = codoncount(humanHEXA) codonHEXA = AAA: 13 AAC: 3 AAG: 4 AAT: 12 ACA: 9 ACC: 4
STRUCTURAL ANALYSIS: 1. To view the molecular structure of aspirin.mol in MATLAB , by typing
molviewer('aspirin.mol')
2. View the molecule with a PDB identifier of 2DHB.
molviewer('2DHB')
RESULT AND INTERPRETATION :
1. In sequence alignment by Global Alignment, the percentage of identity is increased
from 64.5% to 84% by refining the alignment. But Local Alignment provides the best
percentage of identity (83%) without refining.
2. Sequence Statistics was performed to analyse the amount of bases and codons in
the sequence.
3. MOL VIEWER was used to analyse the structure of a protein in MATLAB.
Expt No-5: MODELING OF BIOLOGICAL SYSTEMS Date:
AIM: To Model Biological systems.
INTRODUCTION:
Simple Model for Single Substrate Catalyzed Reactions. A simple model for enzyme-
catalyzed reactions starts a substrate S reversibly binding with an enzyme E. Some of the
substrate in the substrate/enzyme complex is converted to product P with the release of the
enzyme.
S + E ES E + P
k1 k1r
v1 = k1[S][E], v1r = k1r[ES], v2 = k2[ES]
SIGNIFICANCE OF SIMULATION
E+S ES
This is the first reaction taking place in the model of the biological system in this reaction the
Enzyme(E) and Substrate(S) combines with each to form an enzyme-substrate complex
during this simulation the ES complex concentration will increase and hence the enzyme and
substrate concentrations will decrease in the beginning of the reaction process . Through the
simulation process of these reactions we can alter the biological systems to produce more
efficient results as we can predict the outcome for each and every concentration of the
reactants from which we can come to conclusion so start the reactions in the lab so that we
get the outcome what we want rather than repeating the steps for so many times which causes
time loss and also expensive too.
ES E+P
The enzyme-substrate complex then splits into product(P) and enzyme(E) in the further
process of the reaction so from the initial amount zero moles the concentration of the product
increases gradually and then meanwhile the enzyme concentration also increases gradually as
the enzyme is produced again in the reaction along with the end product of the reaction.
These details can be studied in the simulation of the model process to know how much
concentrations the compounds are required to obtain the needs . The graph which the
simulation provides was useful in obtaining such details. The Graph plot with the
concentrations of the compounds along the time .
STEPS INVOLVED:
Step: 1 Type sbiodesktop in the matlab command box .
Step: 2 Go to File tab in the desktop and click on the “new project” option.
Step: 3 New project will opened in that click the diagram view option.
Step: 4 The workspace will open in that we have to design our model
Step: 5 The reactions and species are drawn by dragging them into the compartment and the
interaction between them are drawn by pressing ctrl and then clicking in the connecting
species or reaction
Step: 6 The reaction properties box is opened by clicking any of the reactions in the model
Step: 7 Set the kinetic law option and then the kinetic law parameters for this kinetic law was
chosen
Step: 8 The model can be simulated only when the reaction is correctly configured this can
be identified by seeing a check box near the reaction rate option if it is green then we
can carry on the simulation process.
Step: 9 From the obtained graph from the model simulated we interpret the result required.
Paramter Details:
Reaction: null -> P
Reaction rate: k mole/second
Species: P = 0 mole
Parameters: k = 1 mole/second
RESULT (ZERO ORDER REACTION) :
STEPS INVOLVED FOR FIRST ORDER REACTION: Step: 1 Type sbiodesktop in the matlab command box .
Step: 2 Go to File tab in the desktop and click on the “new project” option.
Step: 3 New project will opened in that click the diagram view option.
Step: 4 The workspace will open in that we have to design our model
Step: 5 The reactions and species are drawn by dragging them into the compartment and the
interaction between them are drawn by pressing ctrl and then clicking in the connecting
species or reaction
Step: 6 The reaction properties box is opened by clicking any of the reactions in the model
Step: 7 Set the kinetic law option and then the kinetic law parameters, the kinetic law was
chosen for this reaction the initial value of the reactant is entered as 10.0 moles
Step: 8 The model can be simulated only when the reaction is correctly configured this can be
identified by seeing a check box near the reaction rate option if it is green then we can
carry on the simulation process.
Step:9 From the obtained graph from the model simulated we interpret the result required.
reaction: R -> P
reaction rate: k*R mole/(liter*second)
species: R = 10 mole/liter
P = 0 mole/liter
parameters: k = 1 1/second
STEPS INVOLVED FOR SECOND ORDER REACTION:
Step:1 Type sbiodesktop in the matlab command box .
Step:2 Go to File tab in the desktop and click on the “new project” option.
Step:3 New project will opened in that click the diagram view option.
Step:4 The workspace will open in that we have to design our model
Step:5 The reactions and species are drawn by dragging them into the compartment and the
interaction between them are drawn by pressing ctrl and then clicking in the connecting
species or reaction
Step:6 The reaction properties box is opened by clicking any of the reactions in the model
Step:7 Set the kinetic law option and then the kinetic law parameters, for this reaction law
of mass action was chosen, the initial value of the reactant is entered as 10.0 moles.
Step:8 The model can be simulated only when the reaction is correctly configured this can
be identified by seeing a check box near the reaction rate option if it is green then we
can carry on the simulation process.
Step: 9 From the obtained graph from the model simulated we interpret the result required.
reaction: R1 + R2 -> P
reaction rate: k*R1*R2 mole/(liter*second)
species: R1 = 10 mole/liter
R2 = 8 mole/liter
P = 0 mole/liter
parameters: k = 1 liter/(mole*second)
CREATING A MODEL: Reactions for a model should be constructed: S + E -> ES ES -> S + E ES -> E + P
STEPS INVOLVED :
Step:1 Type sbiodesktop in the matlab command box .
Step:2 Go to File tab in the desktop and click on the “new project” option.
Step:3 New project will opened in that click the diagram view option.
Step:4 The workspace will open in that we have to design our model
Step:5 The reactions and species are drawn by dragging them into the compartment and the
interaction between them are drawn by pressing ctrl and then clicking in the connecting
species or reaction
Step:6 The reaction properties box is opened by clicking any of the reactions in the model
Step:7 Set the kinetic law option and then the kinetic law parameters ,under the kinetic law
option we select law of mass action, then the initial value of the reactant is entered as
10.0 moles
Step:8 The model can be simulated only when the reaction is correctly configured this can
be identified by seeing a check box near the reaction rate option if it is green then we
can carry on the simulation process.
Step:9 From the obtained graph from the model simulated we interpret the result required.
reaction: S + E -> ES
reaction rate: k1*S*E (binding)
reaction: ES -> S + E
reaction rate: k1r*ES (unbinding)
reaction: ES -> E + P
reaction rate: k2*ES (transformation)
species: S = 8 mole
E = 4 mole
ES = 0 mole
P = 0 mole
parameters: k1 = 2 1/(mole*second)
k1r = 1 1/second
k2 = 1.5 1/second
RESULT AFTER RUNNING THE MODEL:
RESULT AND INTERPRETATION :
From the result obtained from the simulation it was noted how the compounds concentration
vary as the time increases. The initial concentration of the substrate was S = 8 mole it has
taken a gradual decrease in its concentration by forming the enzyme-substrate complex and
after particular time the concentration of the substrate becomes constant that zero. Then the
enzyme initial concentration was E= 4 Mole it was noted that the concentration of the
enzyme started decreasing as the reaction was initiated then when it was about to become
zero its concentration started increasing and finally became constant once it reached the
initial value this was due the enzyme formation along with the end product. The initial values
of the enzyme-substrate complex and end-product are ES = 0 mole and P = 0 mole once the
reaction has started the concentration of the ES complex had a increase then when the
dissociation of the ES complex started its concentration again started to decrease as the time
passes. The end product of the reaction concentration has a gradual increase throughout the
simulation from this we can say the amount of the product produced for a particular
concentration of the reactants. The entire 8mole of the substrate got transformed to the end
product P, Therefore we can predict the requirement of the reactant concentration based on
the need from the simulation of the biological models in matlab.