mutationscs.wellesley.edu/~cs313/lectures/2_mutations.pdf · 2019-08-14 · pam1 mutation...

7
B - 1 Mutations B - 2 Tree Of Life B - 3 Tree Of Life B - 4 Mutations vs. Substitutions Mutations are changes in DNA Substitutions are mutations that evolution has tolerated Which rate is greater? Replicative proofreading and DNA repair constrain the mutation rate

Upload: others

Post on 12-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 1

Mutations

B - 2

Tree Of Life

B - 3

Tree Of Life

B - 4

Mutations vs. Substitutions

• Mutations are changes in DNA

• Substitutions are mutations that evolution has tolerated

Which rate is greater?

Replicative proofreading and DNA repair constrain the mutation rate

Page 2: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 5

Selectionist Evolution

• Most mutations are deleterious; removed via negative selection

• Advantageous mutations positively selected

• Variability arises via selection

deleterious

beneficialneutral

B - 6

Why Are Mutations Important?

Mutations can be deleterious

Mutations drive evolution

B - 7

UV Damage to DNA

Thymine dimers

UV

What happens if damage is not repaired?B - 8

Radiation Resistant

• 10 Gray will kill a human• 60 Gray will kill an E. coli culture• Deinococcus can survive 5000 Gray

Page 3: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 9

A Sequence Mutating at Random

1

Multiple substitutions at one site can cause underestimation of number of actual substitutions

121

3 *

5 *6 *7 *8 *9 *10 *11 *12

2 *

4

9 actual substitutions

p=5 observed substitutions

B - 10

Simulating Random Mutations

observed substitutions

Substitutions

Sequ

ence

dis

tanc

e

actual substitutions

B - 11

Measuring Sequence Divergence:Why Do We Care?

• Inferring phylogenetic relationships

• Dating divergence, correlating with fossil record

• Use in sequence alignments and homology searches of databases*

* Comparative genomics is an important field. Determining not only how many substitutions exist between two sequences but how similar two sequences are. B - 12

DNA Structure

AT

GC

AT

C G

OH

AT

OH

5’

5’

3’

3’

G-C: 3 hydrogen bondsA-T: 2 hydrogen bonds

Two base types:- Purines (A, G)- Pyrimidines (T, C)

Page 4: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 13

Not All Base Substitions Are Created Equal

• Transitions

• Transversions

Transition rate ~2x transversion rate

• Purine to purine (A ➔ G or G ➔ A)• Pyrimidine to pyrimidine (C ➔ T or T ➔ C)

• Purine to pyrimidine (A ➔ C or T; G ➔ C or T )• Pyrimidine to purine (C ➔ A or G; T ➔ A or G)

B - 14

Substition Rates Differ Across Genomes

Alignment of 3,165 human-mouse pairs

Splice sites

Start of transcription Polyadenylation site

B - 15

The PAM Model of Protein Sequence Evolution

• Empirical data-based substitution matrix

• Global alignments of 71 families of closely related proteins.

• Constructed hypothetical evolutionary trees

• Built matrix of 1572 amino acid point accepted mutations

B - 16

Original PAM Substitution Matrix

Dayhoff, 1978

Count number of times residue i was replaced with residue j

Page 5: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 17

Deriving PAM Matrices

mj = # times amino acid j mutatedtotal occurrences of amino acid j

For each amino acid, calculate its relative mutability, i.e.,

the likelihood that the amino acid will mutate:

B - 18

Deriving PAM Matrices

Calculate mutation probabilities for each possible substitution

Mi,j = relative mutability x proportion of all substitutions to j by changing to i

mj x Ai,jMi,j =

∑Ai,ji

B - 19

PAM1 Mutation Probability Matrix

Dayhoff, 1978

B - 20

Deriving PAM Matrices

Calculate log odds ratio to convert mutation probability to substitution score

(Mi,j)

Frequency of residue i(Probability of amino acid i

occurring by chance)

Mutation probability(Prob. substitution from j to i

is an accepted mutation)Si,j = 10 x log10 fi

( )

Page 6: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 21

Deriving PAM Matrices

Scoring in log odds ratio:- Allows addition of scores for residues in alignments

Interpretation of score:- Positive: non-random (accepted mutation) favored- Negative: random model favored

B - 22

PAM Matrix

B - 23

Using PAM Scoring Matrices

PAM1: 1% difference (99% identity)

Can “evolve” the mutation probability matrix by multiplying it by itself, then take log odds ratio

(PAMn = PAM matrix multiplied by itself n times)

B - 24

BLOSUM = BLOCKS Substitution Matrix

• Like PAM, empirical proteins substitution matrices, use log odds ratio to calculate substitution scores

• Large database: local alignments of conserved regions of distantly related proteins

Gapless alignment

blocks

Page 7: Mutationscs.wellesley.edu/~cs313/lectures/2_Mutations.pdf · 2019-08-14 · PAM1 Mutation Probability Matrix Dayhoff, 1978 B -20 Deriving PAM Matrices Calculate log odds ratio to

B - 25

BLOSUM Uses Clustering To Reduce Sequence Bias

• Cluster the most similar sequences together

• Reduce weight of contribution of clustered sequences

• BLOSUM number refers to clustering threshold used (e.g. 62% for BLOSUM 62 matrix)

B - 26

BLOSUM and PAM Substitution Matrices

BLOSUM 30

BLOSUM 62

BLOSUM 90

% identity

PAM 250 (80)

PAM 120 (66)

PAM 90 (50)

% change

change

BLAST algorithm uses BLOSUM 62 matrix

B - 27

PAM and BLOSUM

• Smaller set of closely related proteins - short evolutionary period

• Use global alignment• More divergent matrices

extrapolated• Errors arise from

extrapolation

• Larger set of more divergent proteins-longer evolutionary period

• Use local alignment• Each matrix calculated

separately• Clustering to avoid bias• Errors arise from

alignment errors

PAM BLOSUM

B - 28

Importance of Scoring Matrices

• Scoring matrices appear in all analyses involving sequence comparison

• The choice of matrix can strongly influence the outcome of the analysis