amino acids and their properties - university...
TRANSCRIPT
-
Amino Acids and Their Properties
-
Recap: ss-rRNA and mutations
Ribosomal RNA (rRNA) evolves very slowly
Much slower than proteins
ss-rRNA is typically used
So by aligning ss-rRNA of one organism with that of another
We can estimate relatedness
-
Amino Acid Substitutions
Recall we can align DNA & RNA sequences
What does that mean?
We can also align two amino acid sequences
Can 2 nucleotides partially match?
Can 2 amino acids partially match?
-
Amino Acid Substitutions
Aligning sequences
Can 2 nucleotides partially match?
Are some nucleotide mutations more significant than others?
Can 2 amino acids partially match?
Are some amino acid mismatches more significant than others?
-
Amino Acid Substitutions
Can 2 nucleotides partially match?
Significance of a nucleobase mutation
Does name matter?
Does location matter?
Can 2 amino acids partially match?
Significance of an amino acid mutation
Name? Location?
-
Sequence matching and evolution rate
Proteins tend to evolve slower than DNA
Many DNA changes have no affect on a protein
A changed codon may map to the same amino acid
Non-coding DNA changes may have no effect
What does this mean for gauging the relatedness of
humans and chimpanzees?
humans and fish?
-
Sequence matching and evolution rate
Ribosomal RNA (rRNA) evolves very slowly
Much slower than proteins
What might rRNA matching be good for measuring the relatedness of?
humans and chimpanzees?
humans and fish?
humans and what?
-
Sequence matching and evolution rate
Ribosomal RNA (rRNA) evolves very slowly
Much slower than proteins
ss-rRNA is typically used
(what's that?)
However, different regions of ss-rRNA mutate at different rates
(Ribosome images next)
-
The Ribosome
Source: www.buzzle.com/articles/ribosomes-function.html
-
Ribosomes: diagrams and images
...check images.google.com for:
Ribosome diagram
Ribosome structure
Videos includehttp://www.youtube.com/watch?v=ID7tDAr39Ow
http://www.youtube.com/watch?v=ID7tDAr39Owhttp://www.youtube.com/watch?v=ID7tDAr39Ow
-
Recap: ss-rRNA and mutations
Ribosomal RNA (rRNA) evolves very slowly
Much slower than proteins
ss-rRNA is typically used
So by aligning ss-rRNA of one organism with that of another
We can estimate relatedness
-
Relatedness and Mutations
Much DNA mutates relatively quickly
Much ss-rRNA mutates relatively slowly
Much protein mutates at intermediate rates
Let's focus on protein mutation next
-
Amino acid subsitutions
Some amino acids substitutions are more likely than others
Why?
-
Amino acid substitutions
Some amino acids substitutions are more likely than others
Why?
Some are closer to others in terms of nucleobase codons
Some are closer in terms of resulting protein function
-
Amino acid substitutions II
Substituting similar ones is likely to Retain the protein structure and function
Substituting dissimilar ones is likely to Change the protein structure and function
Similarity of amino acids means what?
-
Amino acid substitutions III
Similarity of amino acids means similar physicochemical properties
Physicochemical: Concerning the physical and chemical Concerning physical chemistry
Physical chemistry: Connecting macroscopic properties of substances with their molecular
properties
-
Amino acid physicochemical properties
Nonpolar(Hydrophobic)
ACFGILMPVW
Polar (hydrophilic): NQSTY
Aromatic: FHWY (having to do with 6-carbon rings)
Basic: HKR
Acidic: DE (See http://www.bio.davidson.edu/courses/genomics/jmol/aatable.html By way of contrast, can anyone think of a non-
physicochemical property of some amino acids?
http://www.bio.davidson.edu/courses/genomics/jmol/aatable.htmlhttp://www.bio.davidson.edu/courses/genomics/jmol/aatable.html
-
Aromatic
Special type of ring-shaped molecule
Characterized by an unusual stabilizing property
Aliphatic
Non-aromatic
-
Amino acid abbrevs.
G=glycine, P=proline, T=threonine, A=alanine, , but why the following??
F=phenylalanine
Y=tyrosine
N=asparagine
Q=glutamine
W=tryptophan
-
Scoring protein sequence alignments
Simple way:
Two matching (identical) amino acids score 1
Two mismatching (non-identical) ones score 0
Goal: maximize % of matching amino acids
Works well for very similar sequences
Example:
CADQH
CADPM
Alignment score=___
-
Scoring protein sequence alignments II
Simple way ignores degree of similarity better to account for degree of similarity!
Solution: substitution matrices
PAM (Accepted Point Mutation, but PAM easier to say than APM) matrix Developed in 1970s by Margaret Dayhoff PAM1 matrix:
answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?
Simple way ignores degree of similarity better to account for degree of similarity!
Solution: substitution matrices
PAM (Accepted Point Mutation, but PAM easier to say than APM) matrix Developed in 1970s by Margaret Dayhoff PAM1 matrix:
answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?
-
Scoring protein sequence alignments II
Substitution matrices PAM (Accepted Point Mutation, but PAM
easier to say than APM) matrix PAM1 matrix:
answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?
PAM2 matrix:
Not 2%! Rather, 1%, twice What is the difference?
-
Scoring protein sequence alignments II
Substitution matrices PAM (Accepted Point Mutation, but PAM
easier to say than APM) matrix PAM1 matrix:
answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?
PAM250 matrix:
Not 250%, obviously Why obviously? It is 1%, repeated 250 times!
-
Scoring protein sequence alignments II
Substitution matrices PAM (Accepted Point Mutation, but PAM
easier to say than APM) matrix PAM1 matrix:
answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?
PAM250 matrix:
It is 1%, repeated 250 times!
BLOSUM matrix is a popular type also
-
Scoring protein sequences: PAM250
Here is PAM250
source: http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/PAM250matrix.gif
CADQH
CADPM
Alignment
score=?
http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/PAM250matrix.gifhttp://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/PAM250matrix.gif
-
Scoring protein sequences: BLOSUM62 (default in Blast 2.0)
Source=http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/pairwise.html.
-
Why do self substitutions have the highest numbers?
-
Why use PAM, BLOSUM, etc.?
Sequence similarity is related to evolutionary distance
Simple base matching (match/not) may work ok for closely related organisms humans and chimps, for example
Amino acid matching works better as evolutionary distance increases (why?)
Wed like to be able to assess relatedness of organisms that diverged long ago humans and worms, for example
-
Relatedness Long Ago
See images.google.com for
domains of life
We still are not sure, but the 3-domain system seems likely
But cladistics demands binary splits, so
3 domains requires 2 splits, and
2 domains are more related than the 3rd
-
Why use PAM, BLOSUM(II)
Organisms that diverged long ago have divergent analogous amino acid sequences
Since different amino acid substitutions occur at different frequencies we can measure relatedness back farther
e.g. when the fraction of identical amino acids is surprisingly low
and the fraction of identical base pairs is even lower
-
Comparing Sequences with PAMs (+ recap)
-
What does PAM mean?
PAM is considered an acronym for Point Accepted Mutation
Accepted Point Mutation (original)
Percent Accepted Mutations
A point mutation is a substitution of 1 amino acid for another
An accepted mutation is one that is passed down through the generations
Will a mutation be accepted if it is helpful? Harmful? Neutral? Helpful in some circumstances, harmful in others?
-
What Does PAM Mean, cont.
PAM has two meanings
PAM is a unit of evolutionary time
PAM is kind of substitution matrix
(The meanings are related)
-
PAM as a Unit of Time
A PAM is the amount of evolutionary change resulting in:
1 amino acid mutation per 100 amino acids
It is an average over >>100 amino acids
because mutations have randomness
After 1 PAM, will an organism have exactly 1% of its amino acids different from what they started out as?
-
PAM, Evolution, and Gaps
PAM ignores
Insertions
Deletions
Silent nucleotide substitutions (which are?)
PAM counts a change from A to B and back to A as 2 accepted point mutations
2 sequences 200 PAMs apart will have about 25% of amino acids the same!
-
PAM Matrices
They describe substitutability of amino acids, based on empirical evidence
Empirical = experiential
The matrices are derived from repositories of actual homologous sequences
A PAM 1 matrix is geared to best compare 2 sequences that are 1 PAM apart
A PAM 250 matrix is good for comparing quite diverged sequences
PAM 250 matrix is standard
-
Creating a PAM Matrix
Let fi be the frequency of amino acid i
We express fi as a fraction of the total
fi = instances of i __ .
instances of any amino acid
Frequencies range from
0.091 (L) down to 0.014 (W)
The most common amino acid occurs about ____ times more commonly than the least
-
Creating PAM matrix, cont.
Determine mutabilities of the amino acids
Some amino acids tend to change easily
Others not
If alanines mutability is set to 100
Serines mutability is 117 (highest, 1991 data)
Tryptophans mutability is 25 (lowest, 1991)
Lets look more closely at mi . . .
-
Creating PAM matrix, cont.
Mutability is a number
Given an evolutionary interval of 1 PAM
let mi = # mutations of amino acid i
# instances of amino acid i
Alternatively,
mi = p (an instance of i mutates)
-
Are the formulas on the previous slide identical?
-
Creating PAM matrix, cont.
Next, we break mi into constituent mi,js
That is, i mutates, but into j at what rate?
Use actual data from observed mutations
Populate a matrix of probabilities
-
The Diagonal
Values on the matrix diagonal do not really describe i mutating into itself!
(In reality, can that happen?)
They basically show
p (i does not mutate)
Thus, the columns add up to 1
-
Is the matrix on the last slide
Symmetric? Are there about 1% changed?
-
PAM0
What do you think a PAM 0 matrix might look like?
-
PAMn
Use matrix multiplication
PAM2 = PAM1 x PAM1
PAM3 = PAM2 x PAM1
PAM250? Do it 250 times!
-
PAM
What do you imagine a PAM matrix might look sort of like?
-
Logarithmicize
Actually, we take logarithms to get the usual matrix from the probability matrices
First, build another, reference matrix of expected probabilities
Assume all amino acids are equally mutable
Also assume they mutate into each other in proportion to their frequencies
(I.e., overall amino acid frequencies are maintained, but otherwise they dont care what they mutate into)
-
Logarithmicize
Now we have two matrices
Make a 3rd. Each entry is: Observed probability
Expected probability
were comparing reality to if mutations were truly random
Take the log of each entry to make a 4th An entry of 1 means 10x more mutations of that
type than expected
An entry of -1 means what?
-
Carrying On
We now use the matrix to measure relative evolutionary distance