bioinformatica 29-09-2011-t1-bioinformatics

67

Upload: wvcrieki

Post on 11-May-2015

1.389 views

Category:

Technology


0 download

DESCRIPTION

Slides for bioinformatics

TRANSCRIPT

Page 1: Bioinformatica 29-09-2011-t1-bioinformatics
Page 2: Bioinformatica 29-09-2011-t1-bioinformatics

FBW29-09-2011

Wim Van Criekinge

Page 3: Bioinformatica 29-09-2011-t1-bioinformatics
Page 4: Bioinformatica 29-09-2011-t1-bioinformatics

What is Bioinformatics ?

• Application of information technology to the storage, management and analysis of biological information (Facilitated by the use of computers)– Sequence analysis?– Molecular modeling (HTX) ?– Phylogeny/evolution?– Ecology and population studies?– Medical informatics?– Image Analysis ?– Statistics ? AI ?– Sterkstroom of zwakstroom ?

Page 5: Bioinformatica 29-09-2011-t1-bioinformatics

• Medicine (Pharma)– Genome analysis allows the targeting of genetic

diseases– The effect of a disease or of a therapeutic on RNA and

protein levels can be elucidated– Knowledge of protein structure facilitates drug design– Understanding of genomic variation allows the tailoring

of medical treatment to the individual’s genetic make-up

• The same techniques can be applied to crop (Agro) and livestock improvement (Animal Health)

Promises of genomics and bioinformatics

Page 6: Bioinformatica 29-09-2011-t1-bioinformatics

Bioinformatics: What’s in a name ?

• Begin 1990’s• “Bio-informatics”:

Computing PowerGenbank(Log)

Time (years)

Page 7: Bioinformatica 29-09-2011-t1-bioinformatics

Bioinformatics: What’s in a name ?

• Begin 1990’s• “Bio-informatics”:

– convergence of explosive growth in biotechnology, paralled by the explosive growth in information technology

• Not new: > 30 years that people use “computers” in biology

• In silico biology, database biology, ...

Page 8: Bioinformatica 29-09-2011-t1-bioinformatics

Time (years)

Page 9: Bioinformatica 29-09-2011-t1-bioinformatics

Happy Birthday …

Page 10: Bioinformatica 29-09-2011-t1-bioinformatics

PCR + dye termination

Suddenly, a flash of insight caused him to pull the car off the road and stop. He awakened his friend dozing in the passenger seat and excitedly explained to her that he had hit upon a solution - not to his original problem, but to one of even greater significance. Kary Mullis had just conceived of a simple method for producing virtually unlimited copies of a specific DNA sequence in a test tube - the polymerase chain reaction (PCR)

Page 11: Bioinformatica 29-09-2011-t1-bioinformatics

Math

Informatics

Bioinformatics, a scientific discipline …

Theoretical Biology

Computational Biology

(Molecular)Biology

Computer Science

Bioinformatics

Page 12: Bioinformatica 29-09-2011-t1-bioinformatics

Math Algorithm Development

Informatics

Interface Design

Bioinformatics, a scientific discipline …

AI, Image Analysisstructure prediction (HTX)

Theoretical Biology

Sequence Analysis

Computational Biology

(Molecular)Biology

Expert Annotation

Computer Science

NPDatamining

Bioinformatics

Page 13: Bioinformatica 29-09-2011-t1-bioinformatics

Math Algorithm Development

Informatics

Interface Design

Bioinformatics, a scientific discipline …

AI, Image Analysisstructure prediction (HTX)

Theoretical Biology

Sequence Analysis

Computational Biology

(Molecular)Biology

Expert Annotation

Computer Science

NPDatamining

BioinformaticsDiscovery Informatics – Computational Genomics

Page 14: Bioinformatica 29-09-2011-t1-bioinformatics

Doel van de cursus

• Meer dan een inleiding tot ... het is de bedoeling van de cursus een onderliggend inzicht te verschaffen achter de verschillende technieken.

• Naast het gebruik van recepten, wat terug te vinden is in delen van de syllabus laat een inzicht in – de werking van databanken – en de achterliggende algoritmen

• toe – om wisselende interfaces op nieuwe

problemen toe te passen.

Page 15: Bioinformatica 29-09-2011-t1-bioinformatics

Inhoud Lessen: Bioinformatica

• don 29-09-2011: 1* Bioinformatics (practicum 8.30-11.00) • don 06-10-2011: 2* Biological Databases (practicum 9.00-

11.30) • don 20-10-2011: 3 Sequence Similarity (Scoring Matrices)• don 27-10-2011: 4 Sequence Alignments• don 10-11-2011: 5 Database Searching Fasta/Blast• don 17-11-2011: 6 Phylogenetics• don 24-11-2011: 7 Protein Structure • don 01-12-2011: 8 Gene Prediction, Gene Ontologies &

HMM• don 08-12-2011: 9 ncRNA, Chip Data Analysis, AI• don 15-12-2011: 10 Bio- & Cheminformatics in Drug

Discovery (inhaalweek)• Opgelet: Geen les op don 13-10-2010 en don 3-11-2010

Page 16: Bioinformatica 29-09-2011-t1-bioinformatics

Examen

• Theorie – Deel rond een zelf te kiezen publicatie die in

verband staat met de cursus • Bv Bioinformatics of Computational Biology

– Drie inzichtsvragen over de cursus (inclusief !!)

• Practicum (“open-book”)– Viertal oefeningen die meestal het schrijven van een

programma veronderstellen

• Puntenverdeling 50/50

Page 17: Bioinformatica 29-09-2011-t1-bioinformatics
Page 18: Bioinformatica 29-09-2011-t1-bioinformatics

• Timelin: Magaret Dayhoff …

Page 20: Bioinformatica 29-09-2011-t1-bioinformatics

• http://www.sciencemag.org/cgi/content/full/291/5507/1195

• Printed version in cursus

Page 21: Bioinformatica 29-09-2011-t1-bioinformatics

naturetheHumangenome

Setting the stage …

Page 22: Bioinformatica 29-09-2011-t1-bioinformatics
Page 23: Bioinformatica 29-09-2011-t1-bioinformatics
Page 24: Bioinformatica 29-09-2011-t1-bioinformatics
Page 25: Bioinformatica 29-09-2011-t1-bioinformatics

Genome Meters

• Genomes Online Database (GOLD 1.0)– http://geta.life.uiuc.edu/~nikos/genomes.html– http://www.ebi.ac.uk/research/cgg/genomes.html

• NCBI– http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/bact.ht

ml• INFOBIOGEN

– http://www.infobiogen.fr/doc/data/complete_genome.html

Page 26: Bioinformatica 29-09-2011-t1-bioinformatics

Genome Size

DOGS: Database Of Genome Sizes

E. coli = 4.2 x 106

Yeast = 18 x 106

Arabidopsis = 80 x 106

C.elegans = 100 x 106

Drosophila = 180 x 106

Human/Rat/Mouse = 3000 x 106

Lily = 300 000 x 106

With ... : 99.9 %To primates: 99%

Page 27: Bioinformatica 29-09-2011-t1-bioinformatics
Page 28: Bioinformatica 29-09-2011-t1-bioinformatics

Biological Research

Adapted from John McPherson, OICRAdapted from John McPherson, OICR

Page 29: Bioinformatica 29-09-2011-t1-bioinformatics

And this is just the beginning ….

Next Generation Sequencing is here

Page 30: Bioinformatica 29-09-2011-t1-bioinformatics

Basics of the “old” technology

• Clone the DNA.• Generate a ladder of labeled (colored) molecules

that are different by 1 nucleotide.• Separate mixture on some matrix.• Detect fluorochrome by laser.• Interpret peaks as string of DNA.• Strings are 500 to 1,000 letters long• 1 machine generates 57,000 nucleotides/run• Assemble all strings into a genome.

Page 31: Bioinformatica 29-09-2011-t1-bioinformatics

Basics of the “new” technology

• Get DNA.• Attach it to something.• Extend and amplify signal with some color

scheme.• Detect fluorochrome by microscopy.• Interpret series of spots as short strings of DNA.• Strings are 30-300 letters long• Multiple images are interpreted as 0.4 to 1.2

GB/run (1,200,000,000 letters/day). • Map or align strings to one or many genome.

Page 32: Bioinformatica 29-09-2011-t1-bioinformatics

Next Generation Technologies

• 454–Emulsion PCR–Polymerase–Natural Nucleotides

• 20-100Mb for 5-15k –1% error rate–Homopolymers

Page 33: Bioinformatica 29-09-2011-t1-bioinformatics
Page 34: Bioinformatica 29-09-2011-t1-bioinformatics
Page 35: Bioinformatica 29-09-2011-t1-bioinformatics
Page 36: Bioinformatica 29-09-2011-t1-bioinformatics
Page 37: Bioinformatica 29-09-2011-t1-bioinformatics
Page 38: Bioinformatica 29-09-2011-t1-bioinformatics

One additional insight ...

Page 39: Bioinformatica 29-09-2011-t1-bioinformatics

Read Length is Not As Important For Resequencing

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

8 10 12 14 16 18 20

Length of K-mer Reads (bp)

% o

f P

air

ed

K-m

ers

wit

h U

niq

uely

Assig

nab

le L

ocati

on

E.COLI

HUMAN

Jay Shendure

Page 40: Bioinformatica 29-09-2011-t1-bioinformatics

Two Short Read Techologies

• Illumina GA

• ABI SOLID

Page 41: Bioinformatica 29-09-2011-t1-bioinformatics

Technology Overview: Solexa/Illumina Sequencing

Page 42: Bioinformatica 29-09-2011-t1-bioinformatics
Page 43: Bioinformatica 29-09-2011-t1-bioinformatics
Page 44: Bioinformatica 29-09-2011-t1-bioinformatics
Page 45: Bioinformatica 29-09-2011-t1-bioinformatics
Page 46: Bioinformatica 29-09-2011-t1-bioinformatics
Page 47: Bioinformatica 29-09-2011-t1-bioinformatics

ABI Solid

Dressman 2003

Page 48: Bioinformatica 29-09-2011-t1-bioinformatics

ABI SOLID

Page 49: Bioinformatica 29-09-2011-t1-bioinformatics

ABI SOLID

Page 50: Bioinformatica 29-09-2011-t1-bioinformatics
Page 51: Bioinformatica 29-09-2011-t1-bioinformatics
Page 52: Bioinformatica 29-09-2011-t1-bioinformatics
Page 53: Bioinformatica 29-09-2011-t1-bioinformatics

Paired End Reads are Important!

Repetitive DNAUnique DNA

Single read maps to multiple positions

Paired read maps uniquely

Read 1 Read 2

Known Distance

Page 54: Bioinformatica 29-09-2011-t1-bioinformatics

Single Molecule Sequencing

Helicos Biosciences Corp.

Microscope slide

Single DNA molecule

dNTP-Cy3

* * *

*

primer

Super-cooledTIRF microscope

Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh

Page 55: Bioinformatica 29-09-2011-t1-bioinformatics

Introducing

NXT GNT DXSNext Generation Diagnostics

18th september 2009

Wim Van Criekinge

Page 56: Bioinformatica 29-09-2011-t1-bioinformatics

develop in shortest time frame best assay for most relevant

clinical application

Page 57: Bioinformatica 29-09-2011-t1-bioinformatics
Page 58: Bioinformatica 29-09-2011-t1-bioinformatics

NXT GNT DXS

• GNT– Dedicated Team & Network– Operational: Location– Professionalized

• DXS– Content engine– Product 1 established– Pipeline for n+1

• NXT– Workflow management– Bioinformatics– Epigenetics

Page 59: Bioinformatica 29-09-2011-t1-bioinformatics

Next next generation sequencing

Third generation sequencing

Now sequencing

Page 60: Bioinformatica 29-09-2011-t1-bioinformatics

Complete genomics

Page 61: Bioinformatica 29-09-2011-t1-bioinformatics

Complete genomics

Page 62: Bioinformatica 29-09-2011-t1-bioinformatics

Pacific Biosciences: A Third Generation Sequencing Technology

Eid et al 2008

Page 63: Bioinformatica 29-09-2011-t1-bioinformatics

Pacific Biosciences: A Third Generation Sequencing Technology

Page 64: Bioinformatica 29-09-2011-t1-bioinformatics

Nanopore Sequencing

Page 65: Bioinformatica 29-09-2011-t1-bioinformatics

NCBI (educational resources)

Page 66: Bioinformatica 29-09-2011-t1-bioinformatics

Weblems

• What ?– Web-based problemes (over de huidige les

en/of voorbereiding op volgende les)• When ?

– Einde van elke les• How ?

– Oplossingen online via screencasts– Practicum– Voorbedereiding op het practicum examen ...

Niet alle problemen vereisen noodzakelijk programmacode ...

Page 67: Bioinformatica 29-09-2011-t1-bioinformatics

Weblems

W1.1: To which phyla do the following species belong (a) starfish (b) ginko tree (c) scorpion

W1.2: What are the common names for the following species (a) Orycterophus afer (b) Beta vulagaris (c) macrocystis pyrifera

W1.3: What species has the smallest known genome ? And is genome size related to number of genes ?

W1.4: What are the 5 latest genomes published ? How complete is “coverage” ?

W1.5: For approximately 10% of europeans, the painkiller codeine is ineffective because the patients lack the enzyme that converts codeine into the active molecule, morphine. What is the most common mutation that causes this condition ?