matlab bioinformatics toolkit evaluation

19
Matlab Bioinformatics Matlab Bioinformatics Toolkit Toolkit Evaluation Evaluation Kanishka Bhutani Kanishka Bhutani

Upload: zuzana

Post on 13-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Matlab Bioinformatics Toolkit Evaluation. Kanishka Bhutani. What I expected ??. Local/Global sequence alignments. Multiple sequence alignments. Choice of different scoring matrices (BLOSUM, PAM) for evaluation. Build Hidden Markov Models. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Matlab Bioinformatics Toolkit  Evaluation

Matlab Bioinformatics Toolkit Matlab Bioinformatics Toolkit EvaluationEvaluation

Kanishka BhutaniKanishka Bhutani

Page 2: Matlab Bioinformatics Toolkit  Evaluation

What I expected ??What I expected ??

Local/Global sequence alignments.Local/Global sequence alignments. Multiple sequence alignments.Multiple sequence alignments. Choice of different scoring matrices Choice of different scoring matrices

(BLOSUM, PAM) for evaluation.(BLOSUM, PAM) for evaluation. Build Hidden Markov Models. Build Hidden Markov Models. Easily import sequences from databases Easily import sequences from databases

(PFAM,PDB, Swissprot)(PFAM,PDB, Swissprot)

Page 3: Matlab Bioinformatics Toolkit  Evaluation

What I found ??What I found ??

Most of the features.Most of the features. ““Bonus” =Bonus” =

Microarray normalization tools.Microarray normalization tools.

Microarray Visualization tools including box Microarray Visualization tools including box plots, heat maps.plots, heat maps.

Page 4: Matlab Bioinformatics Toolkit  Evaluation

Any surprises ?Any surprises ?

No “Multiple sequence alignments”No “Multiple sequence alignments” Avg./Std Dev. of hydrophobicity, solvent Avg./Std Dev. of hydrophobicity, solvent

accessibility : Command ?accessibility : Command ? ““Proteinplot”- GUI for protein structure Proteinplot”- GUI for protein structure

analysis.analysis. Import your file to view, select parameters Import your file to view, select parameters

and display stats. and display stats.

Page 5: Matlab Bioinformatics Toolkit  Evaluation

What all I tried?What all I tried?

Local alignment, Global alignment.Local alignment, Global alignment. For short sequences:For short sequences: swalign(‘seq1’,’seq2’)swalign(‘seq1’,’seq2’) nwalign(‘seq1’,’seq2’)nwalign(‘seq1’,’seq2’)seq1,seq2: AA or NT sequences.seq1,seq2: AA or NT sequences. For ‘imported’ long sequences:For ‘imported’ long sequences:Convert seq into a vector of integer valuesConvert seq into a vector of integer valuesCommands: nt2int, aa2intCommands: nt2int, aa2int

Page 6: Matlab Bioinformatics Toolkit  Evaluation

Pairwise Sequence alignmentPairwise Sequence alignment

S = getgenbank(‘NM_00001’)S = getgenbank(‘NM_00001’) M= getgenbank(‘NM_00002’)M= getgenbank(‘NM_00002’) Output : Header and a sequence.Output : Header and a sequence. K=nt2int(S.Sequence)K=nt2int(S.Sequence) B=nt2int(M.Sequence)B=nt2int(M.Sequence)[sc,align] = nwalign [K,B][sc,align] = nwalign [K,B]

Alignment Score Alignment Score Aligned seq.Aligned seq.

Page 7: Matlab Bioinformatics Toolkit  Evaluation

Getting sequences : V Easy !Getting sequences : V Easy !

‘‘getgenbank’: Retrieve sequence getgenbank’: Retrieve sequence information from Genbank database.information from Genbank database.

‘‘getembl’: Retrieve seq. information from getembl’: Retrieve seq. information from EMBL database.EMBL database.

‘‘getpept’: Retrieve seq information from getpept’: Retrieve seq information from Genpept database.Genpept database.

‘‘gethmmprof’: Get HMM from the PFAM gethmmprof’: Get HMM from the PFAM database.database.

Page 8: Matlab Bioinformatics Toolkit  Evaluation

ExperimentExperiment hmmodel = gethmmprof(‘PF00001’)hmmodel = gethmmprof(‘PF00001’)

Page 9: Matlab Bioinformatics Toolkit  Evaluation

Visualization of modelVisualization of modelShowhmmprof (hmmodel,’scale’,’logodds’)

Page 10: Matlab Bioinformatics Toolkit  Evaluation

Get GPCR seq’sGet GPCR seq’s

S = getgenbank (‘NM_024531’)S = getgenbank (‘NM_024531’) disp (S.Sequence)disp (S.Sequence)

Page 11: Matlab Bioinformatics Toolkit  Evaluation

Alignment of the seq’sAlignment of the seq’s var = gethmmalignment var = gethmmalignment

(‘PF00001,’type’,’seed’)(‘PF00001,’type’,’seed’)

disp [char(var.Header) char (var.Sequence)]disp [char(var.Header) char (var.Sequence)]

Page 12: Matlab Bioinformatics Toolkit  Evaluation

For GPCR Family CFor GPCR Family C Similarly for diff families.Similarly for diff families. Multiple aligned sequences retrieved.Multiple aligned sequences retrieved.

Page 13: Matlab Bioinformatics Toolkit  Evaluation

GUI proteinplotGUI proteinplot

User friendly.User friendly. Avg./ Std. dev values for:Avg./ Std. dev values for:

Hydrophobicity.Hydrophobicity.

Secondary structure propensity (Alpha Secondary structure propensity (Alpha helices or beta strands)helices or beta strands)

Accessibility (accessible and buried Accessibility (accessible and buried residues)residues)

Page 14: Matlab Bioinformatics Toolkit  Evaluation

Mglur1 plot (Proteinplot)Mglur1 plot (Proteinplot)

Page 15: Matlab Bioinformatics Toolkit  Evaluation

Mglur1 resultsMglur1 results

ParameterParameter Average Average (%)(%)

Std. Dev.Std. Dev.(%)(%)

Accessible Accessible residuesresidues

5.045.04 1.251.25

Buried Buried residuesresidues

8.228.22 1.8161.816

Alpha helixAlpha helix 0.890.89 0.15650.1565

Beta sheetBeta sheet 0.970.97 0.10380.1038

HydrophobicityHydrophobicity 3.013.01 0.96080.9608

Page 16: Matlab Bioinformatics Toolkit  Evaluation

Test a seq. with HMMTest a seq. with HMM

Retrieve mglur1 from GenbankRetrieve mglur1 from Genbank mgr = getgenbank (‘NM_012407’)mgr = getgenbank (‘NM_012407’) glusequence = mgr.sequenceglusequence = mgr.sequence Test it with the HMM model class ATest it with the HMM model class A [a.sglu] = hmmprofalign (model A, [a.sglu] = hmmprofalign (model A,

glusequence,’showscore’,true)glusequence,’showscore’,true) Score = -203.53Score = -203.53 Seq = Seq =

Page 17: Matlab Bioinformatics Toolkit  Evaluation

Log odd score plot for best pathLog odd score plot for best path

Page 18: Matlab Bioinformatics Toolkit  Evaluation

Difficulties & questionsDifficulties & questions

No multiple sequence alignment.No multiple sequence alignment. Demos: Not very helpful.Demos: Not very helpful. Difficult to view the sequences as no “disp” command Difficult to view the sequences as no “disp” command

found.found. Bugs:Bugs: Storing huge sequences (GPCR A) in a file, parsing error.Storing huge sequences (GPCR A) in a file, parsing error. HMMprofdemo command abruptly stops and gives errors. HMMprofdemo command abruptly stops and gives errors. Proteinplot (GUI) hangs the machine often.Proteinplot (GUI) hangs the machine often. Verify the sequences using the HMM models ??Verify the sequences using the HMM models ?? Regular expression matches and highlighting those Regular expression matches and highlighting those

positions??positions??

Page 19: Matlab Bioinformatics Toolkit  Evaluation

Suggestions of experimentSuggestions of experiment

Given an unknown sample dataset of proteins, Given an unknown sample dataset of proteins, known dataset of proteins (known structural known dataset of proteins (known structural information).information).

Utilize the BLMT to extract ‘over expressed’ 4 Utilize the BLMT to extract ‘over expressed’ 4 Grams in a protein sequence or a group of protein Grams in a protein sequence or a group of protein sequences from the known set.sequences from the known set.

Use “search for regular expression” function in the Use “search for regular expression” function in the Matlab toolkit to look for those ‘4 Grams’ in Matlab toolkit to look for those ‘4 Grams’ in unknown proteins and hence predict their unknown proteins and hence predict their structure.structure.