algorithms in computational biology (236522) spring 2002 lecturer: shlomo moran, taub 639, tel 4363...

42
. Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler, Taub 431, tel 4927 Office hours Monday 1030-1130 Lecture: Tuesday 11:30-13:30, Taub 2 Tutorial: Monday 9:30-10:30, Taub 4

Upload: hilda-sherman

Post on 22-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

.

Algorithms in Computational Biology (236522) 

Spring 2002 

Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730TA: Ydo Wexler, Taub 431, tel 4927Office hours Monday 1030-1130

Lecture: Tuesday 11:30-13:30, Taub 2Tutorial: Monday 9:30-10:30, Taub 4

Page 2: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

2

Course Information

Requirements & Grades: 15-25% homework, in five theoretical question

sets. [Submit in two weeks time]. Homework is obligatory.

75-85% test. Must pass beyond 55 for the homework’s grade to count

Exam date: 7.7.04.

Page 3: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

3

Bibliography

Biological Sequence Analysis, R.Durbin et al. , Cambridge University Press, 1998

Introduction to Molecular Biology, J. Setubal, J. Meidanis, PWS publishing Company, 1997 

Phylogenetics, C. Semple, M. Steel, Oxford press, 2003

url: www.cs.technion.ac.il/~cs236522

Page 4: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

4

Course PrerequisitesComputer Science and Probability Background Data structure 1 (cs234218) Algorithms 1 (cs234247) Probability (any course)

Some Biology Background Formally: None, to allow CS students to take this course. Recommended: Molecular Biology 1 (especially for those in the

Bioinformatics track), or a similar Biology course, and/or a serious desire to complement your knowledge in Biology by reading the appropriate material (see the course web site).

Studying the algorithms in this course while acquiring enough biology background is far more rewarding than ignoring the biological context.

Page 5: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

.

Biological Background

This class has been edited from Nir Friedman’s lecture which is available at www.cs.huji.ac.il/~nir. Changes made by Dan Geiger, then Shlomo Moran.

Solve questions 1-3, p. 30 (to be on the course web site)

Due time: Tutorial class of 22.3.04 (~2 weeks from today), or earlier in the teaching assistant’s mail slot.

First home work assignment: Read the first chapter (pages 1-30) of Setubal et al., 1997. (a copy is available in the Taub building library, and one for loan at Fishbach).

Page 6: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

7

Computational Biology

Computational biology is the application of computational tools and techniques to (primarily) molecular biology.  It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics.

Computational biology is also called Bioinformatics, although many practitioners define Bioinformatics somewhat narrower by restricting the field to molecular Biology only.

Page 7: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

8

Examples of Areas of Interest

• Building evolutionary trees from molecular (and other) data• Efficiently constructing genomes of various organisms• Understanding the structure of genomes (SNP, SSR, Genes)• Understanding function of genes in the cell cycle and disease• Deciphering structure and function of proteins

_____________________SNP: Single Nucleotide PolymorphismSSR: Simple Sequence Repeat

Page 8: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

9

Exponential growth of biological information: growth of sequences, structures, and literature.

Page 9: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

12

Course Goals

Learning about computational tools for (primarily) molecular biology.

Cover computational tasks that are posed by modern molecular biology

Discuss the biological motivation and setup for these tasks

Understand the kinds of solutions that exist and what principles justify them

Page 10: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

13

Topics I

Dealing with DNA/Protein sequences: Genome projects and how sequences are found Finding similar sequences Models of sequences: Hidden Markov Models Transcription regulation Protein Families Gene finding

Page 11: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

14

Topics II

Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from sequences Short term: genetic variations in a population Finding genes by linkage and association

Page 12: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

15

Topics III (if time allows)

Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data How to analyze proteins changes from raw

experimental measurements (MassSpec)

Page 13: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

16

Human Genome

Most human cells contain

46 chromosomes:

2 sex chromosomes (X,Y):

XY – in males.

XX – in females.

22 pairs of chromosomes named autosomes.

USER
what is autosome and the other words
Page 14: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

17

DNA OrganizationS

ourc

e: A

lber

ts e

t al

USER
מהם העיגולים בשקף השני משמאל?
Page 15: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

18

The Double HelixS

ourc

e: A

lber

ts e

t al

Page 16: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

19

DNA Components

Four nucleotide types: Adenine Guanine Cytosine Thymine

Hydrogen bonds(electrostatic connection): A-T C-G

Page 17: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

20

Genome Sizes

E.Coli (bacteria) 4.6 x 106 bases Yeast (simple fungi) 15 x 106 bases Smallest human chromosome 50 x 106 bases Entire human genome 3 x 109 bases

USER
האם למטה זה כרומוזומי האדם? אם לא, מה זה?
Page 18: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

21

Genetic Information

Genome – the collection of genetic information.

Chromosomes – storage units of genes.

Gene – basic unit of genetic information. They determine the inherited characters.

Page 19: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

22

GenesThe DNA strings include: Coding regions (“genes”)

E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes

Control regions These typically are adjacent to the genes They determine when a gene should be “expressed”

“Junk” DNA (unknown function - ~90% of the DNA in human’s chromosomes)

Page 20: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

23

The Cell

All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types.

Page 21: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

24

Example: Tissues in Stomach

How is this variety encoded and expressed ?

Page 22: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

25

Central Dogma

Transcription

mRNA

Translation

ProteinGene

cells express different subset of the genesIn different tissues and under different conditions

שעתוק תרגום

Page 23: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

26

Transcription

Coding sequences can be transcribed to RNA

RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)

Sou

rce:

Mat

hew

s &

van

Hol

de

USER
הסבר על ה"נעצים" הקטנים
Page 24: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

27

Transcription: RNA Editing

Exons hold information, they are more stable during evolution.This process takes place in the nucleus. The mRNA molecules diffuse through the nucleus membrane to the outer cell plasma.

1. Transcribe to RNA2. Eliminate introns3. Splice (connect) exons* Alternative splicing exists

Page 25: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

28

RNA roles Messenger RNA (mRNA)

Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block).

Transfer RNA (tRNA) Decodes the mRNA molecules to amino-acids. It connects

to the mRNA with one side and holds the appropriate amino acid on its other side.

Ribosomal RNA (rRNA) Part of the ribosome, a machine for translating mRNA to

proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the tRNA to the amino acid chain being created.

...

Page 26: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

29

Translation

Translation is mediated by the ribosome Ribosome is a complex of protein & rRNA

molecules The ribosome attaches to the mRNA at a

translation initiation site Then ribosome moves along the mRNA sequence

and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein.

Page 27: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

30

Genetic Code

There are 20 amino acids from which proteins are build.

Page 28: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

31

Protein Structure

Proteins are poly-peptides of 70-3000 amino-acids

This structure is (mostly) determined by the sequence of amino-acids that make up the protein

USER
למצוא קצת יותר מידע על תמונה זו
Page 29: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

32

Protein Structure

Page 30: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

33

Evolution

Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the

chromosomes Evolution plays a major role in biology

Many mechanisms are shared across a wide range of organisms

During the course of evolution existing components are adapted for new functions

Page 31: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

34

Evolution

Evolution of new organisms is driven by Diversity

Different individuals carry different variants of the same basic blue print

Mutations The DNA sequence can be changed due to

single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Page 32: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

35

The Tree of Life

Sou

rce:

Alb

erts

et

al

Page 33: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

38

Characters in Species

A (discrete) character is a property which distinguishes between species (e.g. dental structure, a certain gene)

A characters state is a value of the character (human dental structure).

Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.

Page 34: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

39

Species ≡ VerticesStates ≡ Colors

Characters ≡ Colorings

Evolutionary tree ≡ A tree with many colorings, containing the given vertices

Page 35: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

40

Evolutionary trees should avoid

reversal transitions

A species regains a state it’s direct ancestor has lost. Famous examples:

Teeth in birds. Legs in snakes.

experiment reported in science 80: producing teeth in chickens
Page 36: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

41

Evolutionary trees should avoid convergence transitions

Two species possess the same state while their least common ancestor possesses a different state.

Famous example: The marsupials.

Page 37: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

42

היונקים מימין הם יונקי כיס. קודם היתה התפצלות של כל היומקי כיס, ולאחר מכן התכנסות לכל מיני תכונות דומות ליונקים "רגילים".
Page 38: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

43

Common Assumption:Characters with Reversal or Convergent transitions are highly unlikely in the Evolutionary Tree

A character that exhibits neither reversals nor convergence is denoted homoplasy free.

Page 39: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

44

A character is Homoplasy Free

↕ The corresponding coloring is convex

(each color induces a block)

Page 40: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

45

A partial coloring is convex if it can be completed to a (total) convex coloring

Page 41: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

46

The Perfect Phylogeny Problem

Input: a set of species, and many characters, each assign states (colors) to the species.

Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex?

Page 42: Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday 1630-1730 TA: Ydo Wexler,

47

Input: Some colorings (C1,…,Ck) of a set of vertices (in the example: 3 colorings: left, center, right, each by (the same) two colors).

Problem: Is there a tree T which includes these vertices, s.t. (T,Ci) is convex for i=1,…,k?

RBRRRRBBRRRB

The Perfect Phylogeny Problem(combinatorial setting)

NP-Hard In general, in P for some special cases