the rest of bioinformatics prof. william stafford noble department of genome sciences department of...

35
The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington [email protected]

Upload: madalynn-hemmingway

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

The rest of bioinformatics

Prof. William Stafford NobleDepartment of Genome Sciences

Department of Computer Science and EngineeringUniversity of Washington

[email protected]

Page 2: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

One-minute responses• I always like it when we ask questions and you first say good question, even though the

question is not good.• I liked the lecture although the concepts were a bit advanced for me.• I understood about 90% of everything.

• The Python is more challenging but it is good to get confused sometimes.• Python was more interesting!• The comprehension of Python is improved at 95%.• Today’s program (first one) was really challenging. I thought the second one was easier to

understand.• Python problem 3 was really challenging for me.• The Python today was completely different from the rest and needed more time.

• Do your students at home write one-minute responses for the whole semester every day?– Yes.

• How did we discover the first mutation?– I am not sure I understand the question. We can observe mutations happening in microorganisms in the lab by

sequencing their DNA from one generation to the next.• Are you going to be readily available in future for consultations in case I get stuck?

– Yes, you can always email me at [email protected].• I do not think species are related because I believe in creation.

Page 3: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Outline

• Parsimony• Distance methods

– Computing distances– Finding the tree

• Maximum likelihood

Page 4: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Revision

• How do we compute the probability of observing this column, given this tree and an assumed model of evolution?

ACGCGTTGGGACGCGTTGGGACGCAATGAAACACAGGGAA

T T A G

Pr(column|tree,model)+

Page 5: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Revision

• We enumerate all possible assignments to the internal nodes, compute the probability of each tree, and sum.

T T A G T T A G T T A G

A

A

A A

C

A A

G

A

Page 6: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Revision

• How do we compute the probability of observing this column, given this assigned tree and an assumed model of evolution?

ACGCGTTGGGACGCGTTGGGACGCAATGAAACACAGGGAA

T T A G

Pr(column|tree,model)+T

A

A

Page 7: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Revision

T T A G

T

A

A

πA, πC, πG, πT

L0

L1 L2

L3 L4L5

L6

• We use our evolutionary model to assign a probability to each branch, and then take the product of the probabilities of the branches.

• L(tree) = L0 L1 L2 L3 L4 L5 L6

Page 8: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Revision

• In maximum likelihood estimation, are mutations that occur on branches of a single tree considered independent or mutually exclusive events?– Independent.

• What do different labelings of internal nodes of a tree represent?– Different possible evolutionary histories.

• Are the different labelings independent or mutually exclusive?– Mutually exclusive.

• Are the columns of a multiple alignment considered independent or mutually exclusive?– Independent

Page 9: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Maximum likelihood revisitedfor each possible tree

for each column of the alignmentfor each assignment of internal nodes

for each branch compute the probability of that branchassigned tree probability ← multiply branch probabilities

column probability ← sum assigned tree probabilitiestree probability ← multiply column probabilities

return the tree with the highest probability

Page 10: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Sequence analysis tasks

• Protein structure prediction• Remote homology detection• Gene finding

Page 11: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Protein structure prediction

• Given: amino acid sequence

• Return: protein structure

A complex of earthworm hemoglobin, comprised of 144 globin chains.

Source: Protein Databank.

Page 12: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Remote homology detection

• The hidden Markov model generalizes the PSSM used by PSI-BLAST.

• The model is trained using expectation-maximization.

M1 M2 M3 M4 M5 M6 M7 M8

I1 I2 I3 I4 I5 I6 I7 I8I0

D1 D2 D3 D4 D5 D6 D7 D8

B E

Page 13: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Gene finding

Pedersen and Hein, Bioinformatics 2003.

Page 14: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Mass spectrometry

• Spectrum identification• Protein inference• Biomarker discovery

Page 15: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

EAMPK

GDIFYPGYCPDVK

LPLENENQGK

ASVYNSFVSNGVK

YVMTFK

ENQGVVNR

Page 16: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Biological networks

• Functional networks• Protein-protein interaction networks• Metabolic networks• Regulatory networks

Page 17: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Adai et al. JMB 340:179-190 (2004).

Page 18: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Protein-protein interactions

• Each node is a protein.

• Each edge is a physical interaction.

• Edges are measured via– Yeast two-hybrid– TAP tagging plus

MS/MS

Jeong et al. Nature. 2001.

Page 19: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Regulatory networks

• Mammalian cell cycle.• Colors represent different

types of interactions– Black: binding– Red: covalent

modifications and gene expression

– Green: enzyme actions– Blue: stimulations and

inhibitions

Kohn. Mol Cell Biol. 1999

Page 20: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Metabolic networks

• Nodes are enzymes or metabolites.

• Edges represent interactions.

• This network represents the Arabidopsis TCA cycle.

Page 21: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Gene expression

• Clustering• Predictive modeling• Clinical applications

Page 22: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Gene expression matrix

The matrix entry at (i, j) is the expression level of gene i in experiment j.

Experiments

Gen

es

Page 23: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Fibroblast gene clustering

• Cholesterol biosynthesis• Cell cycle• Immediate-early response• Signaling and angiogenesis• Wound healing and tissue remodeling

Iyer et al. “The transcriptional program in the response of human fibroblasts to serum.” Science. 283:83-7, 1999.

Page 24: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Achieves >75% accuracy.

Page 25: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Next generation sequencing

Next generation sequencing video

Page 26: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Spaced seed alignment

• Tags and tag-sized pieces of reference are cut into small “seeds.”

• Pairs of spaced seeds are stored in an index.

• Look up spaced seeds for each tag.

• For each “hit,” confirm the remaining positions.

• Report results to the user.

Page 27: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Burrows-Wheeler

• Store entire reference genome.

• Align tag base by base from the end.

• When tag is traversed, all active locations are reported.

• If no match is found, then back up and try a substitution.

Page 28: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Spliced-read mapping

• Used for processed mRNA data.• Reports reads that span introns. • Examples: TopHat, ERANGE

Page 29: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Beyond the genome

• Epigenetics• Chromatin state assignment• Genome 3D architecture

Page 30: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Next generation assays

ENCODE Project Consortium 2011. PLoS Biol 9:e1001046

Page 31: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Rediscovering genes

Page 32: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Page 33: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Population genetics

• Genotype to phenotype• Human disease genetics• Population history

Page 34: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

jbiol.com

Human migrations

Page 35: The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Other topics

• Natural language processing• Image analysis