what is bioinformatics?. what are bioinformaticians up to, actually? manage molecular biological...

32
What is bioinformatics?

Upload: peregrine-glenn

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

What is bioinformatics?

Page 2: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

What are bioinformaticians up to, actually?

• Manage molecular biological data– Store in databases, organise, formalise, describe...

• Compare molecular biological data• Find patterns in molecular biological data

– phylogenies

– correlations (sequence / structure / expression / function / disease)

Goals:• characterise biological patterns & processes• predict biological properties

– low level data high level properties ⇒(eg., sequence function)⇒

Page 3: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Bioinformatics: neighbour disciplines

• Computational biology– Broader concept: includes computational ecology,

physiology, neurology etc...

• -omics:– Genomics– Transcriptomics– Proteomics

• Systems biology– Putting it all together...– Building models, identify control & regulation

Page 4: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Bioinformatics: prerequisites

• Bio- side:– Molecular biology– Cell biology – Genetics– Evolutionary theory

• -informatics side:– Computer science– Statistics– Theoretical physics

Page 5: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Molecular biology data...

>alpha-DATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCACCCAGACTGTGGAGCCGAGGCCCTGGAGAGGTGCGGGCTGAGCTTGGGGAAACCATGGGCAAGGGGGGCGACTGGGTGGGAGCCCTACAGGGCTGCTGGGGGTTGTTCGGCTGGGGGTCAGCACTGACCATCCCGCTCCCGCAGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACTTGCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGGCCGCCTTGGGCAACGCTGTCAAGAGCCTGGGCAACCTCAGCCAAGCCCTGTCTGACCTCAGCGACCTGCATGCCTACAACCTGCGTGTCGACCCTGTCAACTTCAAGGCAGGCGGGGGACGGGGGTCAGGGGCCGGGGAGTTGGGGGCCAGGGACCTGGTTGGGGATCCGGGGCCATGCCGGCGGTACTGAGCCCTGTTTTGCCTTGCAGCTGCTGGCGCAGTGCTTCCACGTGGTGCTGGCCACACACCTGGGCAACGACTACACCCCGGAGGCACATGCTGCCTTCGACAAGTTCCTGTCGGCTGTGTGCACCGTGCTGGCCGAGAAGTACAGATAA>alpha-AATGGTGCTGTCTGCCAACGACAAGAGCAACGTGAAGGCCGTCTTCGGCAAAATCGGCGGCCAGGCCGGTGACTTGGGTGGTGAAGCCCTGGAGAGGTATGTGGTCATCCGTCATTACCCCATCTCTTGTCTGTCTGTGACTCCATCCCATCTGCCCCCATACTCTCCCCATCCATAACTGTCCCTGTTCTATGTGGCCCTGGCTCTGTCTCATCTGTCCCCAACTGTCCCTGATTGCCTCTGTCCCCCAGGTTGTTCATCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACCTGTCACATGGCTCCGCTCAGATCAAGGGGCACGGCAAGAAGGTGGCGGAGGCACTGGTTGAGGCTGCCAACCACATCGATGACATCGCTGGTGCCCTCTCCAAGCTGAGCGACCTCCACGCCCAAAAGCTCCGTGTGGACCCCGTCAACTTCAAAGTGAGCATCTGGGAAGGGGTGACCAGTCTGGCTCCCCTCCTGCACACACCTCTGGCTACCCCCTCACCTCACCCCCTTGCTCACCATCTCCTTTTGCCTTTCAGCTGCTGGGTCACTGCTTCCTGGTGGTCGTGGCCGTCCACTTCCCCTCTCTCCTGACCCCGGAGGTCCATGCTTCCCTGGACAAGTTCGTGTGTGCCGTGGGCACCGTCCTTACTGCCAAGTACCGTTAA

• DNA sequences

Page 6: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Molecular biology data...

• Amino acid sequences

• Protein structure:– X-ray crystallography

– NMR

Page 7: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Cell biology & proteomics data...

• Subcellular localization

Page 8: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Cell biology & proteomics data...

protein-protein interactions

Page 9: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

DNA microarray technologyTranscriptomics: DNA microarray technology

Page 10: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Proteomics & transcriptomics data

Proteins encoded by periodically expressed genes: a functionally diverse protein category

Page 11: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Cell cycle regulation

Multiprotein complexes in yeast (Saccharomyces cerevisiae).

Coloured components are periodically expressed (dynamic). White components are always expressed (static).

Complexes contain both kinds of components (just-in-time assembly).

Dynamic complex formation during the yeast cell cycle.Science. 2005 Feb 4;307(5710):724-7.

de Lichtenberg U, Jensen LJ, Brunak S, Bork P.

Page 12: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Cell cycle regulation 2

Transcriptional regulation is poorly conserved!

Lars Juhl Jensen, Thomas Skøt Jensen, Ulrik de Lichtenberg, Søren Brunak and Peer Bork, Nature, Oct 5, 2006

Dynamic components overlap:

Page 13: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Phenotype data: human diseases

Page 14: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Prediction methods

• Homology / Alignment

• Simple pattern (“word”) recognition • Statistical methods

– Weight matrices: calculate amino acid probabilities– Other examples: Regression, variance analysis, clustering

• Machine learning– Like statistical methods, but parameters are estimated by

iterative training rather than direct calculation– Examples: Neural Networks (NN), Hidden Markov Models

(HMM), Support Vector Machines (SVM)

• Combinations

Page 15: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Simple pattern (“word”) recognition

Example: PROSITE entry PS00014, ER_TARGET:

Endoplasmic reticulum targeting sequence (”KDEL-signal”).

Pattern: [KRHQSA]-[DENQ]-E-L>

NB: only yes/no answers!

Page 16: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Statistical methods

ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV

• Estimate probabilities for nucleotides / amino acids• Information content in sequences; logos; Position-

Weight Matrices.• Quantitative answers.

Page 17: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

ADDGSLAFVPSEF--SISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKISMSEEDLLN TVNGAI--PGPLIAERLKEGQNVRVTNTLDEDTSIHWHGLLVPFGMDGVPGVSFPG---I-TSMAPAFGVQEFYRTVKQGDEVTVTIT-----NIDQIED-VSHGFVVVNHGVSME---IIE--KMKYLTPEVFYTIKAGETVYWVNGEVMPHNVAFKKGIV--GEDAFRGEMMTKD----TSVAPSFSQPSF-LTVKEGDEVTVIVTNLDE------IDDLTHGFTMGNHGVAME---VASAETMVFEPDFLVLEIGPGDRVRFVPTHK-SHNAATIDGMVPEGVEGFKSRINDE----TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI

Sequence profiles

Conserved

Non-conserved

Matching anything but G => large negative score

Anything can match

TKAVVLTFNTSVEICLVMQGTSIV----AAESHPLHLHGFNFPSNFNLVDPMERNTAGVP

TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI

Unlike Position-Weight Matrices, sequence profiles are able to handle gaps

Page 18: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

HMM (Hidden Markov Models)

Which parts of the sequence are

– in the cytoplasm?

– embedded in the membrane (TM-helices)?

– outside?

Rule-of-thumb says amino acids in the TM helices are hydrophobic – but there are exceptions

TMHMM predicts topology from sequence using a statistical model

TMHMM: predicting the topology of α-helix transmembrane proteins

Page 19: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

IKEEHVI IQAE

HEC

IKEEHVIIQAEFYLNPDQSGEF…..Window

Input Layer

Hidden Layer

Output Layer

Weights

Neural Networks: ”Architecture”

Page 20: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Neural networks

• Neural networks can learn higher order correlations!– What does this

mean?

A A => 0A C => 1C A => 1C C => 0

No linear function can learn this pattern

Neural Networks: advantages

Page 21: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

How to estimate parameters for prediction?

Page 22: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Model selection

Linear Regression Quadratic Regression Join-the-dots

Page 23: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

The test set method

Page 24: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

The test set method

Page 25: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

The test set method

Page 26: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

The test set method

Page 27: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

The test set method

Page 28: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Problem: sequences are related

• If the sequences in the test set are closely related to those in the training set, we can not measure true generalization performanceALAKAAAAM

ALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV

Page 29: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Protein sorting in eukaryotes

• Proteins belong in different organelles of the cell – and some even have their function outside the cell

• Günter Blobel was in 1999 awarded The Nobel Prize in Physiology or Medicine for the discovery that "proteins have intrinsic signals that govern their transport and localization in the cell"

Page 30: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Secretory proteins have a signal peptide

Initially, they are transported across the ER membrane

Protein sorting: secretory pathway / ER

Page 31: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

Signal peptides

A signal peptide is an N-terminal part of the amino acid chain, containing a hydrophobic region.

Signal peptides differ between proteins, and can be hard to recognize.

Page 32: What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe

SignalP

http://www.cbs.dtu.dk/services/SignalP/

Method for prediction of signal peptides from the amino acid sequence

Based on Neural Networks

The first SignalP paper (Nielsen H, et al., Protein Eng. 10[1]: 1-6, January 1997) was cited 3144 times in a 10 year period