amino acids and their properties - university...

Download Amino Acids and Their Properties - University ofualr.edu/jdberleant/courses/BINF4445+5445/lecture5anotes.pdf · Amino Acid Substitutions Aligning sequences Can 2 nucleotides partially

If you can't read please download the document

Upload: phamkhue

Post on 06-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • Amino Acids and Their Properties

  • Recap: ss-rRNA and mutations

    Ribosomal RNA (rRNA) evolves very slowly

    Much slower than proteins

    ss-rRNA is typically used

    So by aligning ss-rRNA of one organism with that of another

    We can estimate relatedness

  • Amino Acid Substitutions

    Recall we can align DNA & RNA sequences

    What does that mean?

    We can also align two amino acid sequences

    Can 2 nucleotides partially match?

    Can 2 amino acids partially match?

  • Amino Acid Substitutions

    Aligning sequences

    Can 2 nucleotides partially match?

    Are some nucleotide mutations more significant than others?

    Can 2 amino acids partially match?

    Are some amino acid mismatches more significant than others?

  • Amino Acid Substitutions

    Can 2 nucleotides partially match?

    Significance of a nucleobase mutation

    Does name matter?

    Does location matter?

    Can 2 amino acids partially match?

    Significance of an amino acid mutation

    Name? Location?

  • Sequence matching and evolution rate

    Proteins tend to evolve slower than DNA

    Many DNA changes have no affect on a protein

    A changed codon may map to the same amino acid

    Non-coding DNA changes may have no effect

    What does this mean for gauging the relatedness of

    humans and chimpanzees?

    humans and fish?

  • Sequence matching and evolution rate

    Ribosomal RNA (rRNA) evolves very slowly

    Much slower than proteins

    What might rRNA matching be good for measuring the relatedness of?

    humans and chimpanzees?

    humans and fish?

    humans and what?

  • Sequence matching and evolution rate

    Ribosomal RNA (rRNA) evolves very slowly

    Much slower than proteins

    ss-rRNA is typically used

    (what's that?)

    However, different regions of ss-rRNA mutate at different rates

    (Ribosome images next)

  • The Ribosome

    Source: www.buzzle.com/articles/ribosomes-function.html

  • Ribosomes: diagrams and images

    ...check images.google.com for:

    Ribosome diagram

    Ribosome structure

    Videos includehttp://www.youtube.com/watch?v=ID7tDAr39Ow

    http://www.youtube.com/watch?v=ID7tDAr39Owhttp://www.youtube.com/watch?v=ID7tDAr39Ow

  • Recap: ss-rRNA and mutations

    Ribosomal RNA (rRNA) evolves very slowly

    Much slower than proteins

    ss-rRNA is typically used

    So by aligning ss-rRNA of one organism with that of another

    We can estimate relatedness

  • Relatedness and Mutations

    Much DNA mutates relatively quickly

    Much ss-rRNA mutates relatively slowly

    Much protein mutates at intermediate rates

    Let's focus on protein mutation next

  • Amino acid subsitutions

    Some amino acids substitutions are more likely than others

    Why?

  • Amino acid substitutions

    Some amino acids substitutions are more likely than others

    Why?

    Some are closer to others in terms of nucleobase codons

    Some are closer in terms of resulting protein function

  • Amino acid substitutions II

    Substituting similar ones is likely to Retain the protein structure and function

    Substituting dissimilar ones is likely to Change the protein structure and function

    Similarity of amino acids means what?

  • Amino acid substitutions III

    Similarity of amino acids means similar physicochemical properties

    Physicochemical: Concerning the physical and chemical Concerning physical chemistry

    Physical chemistry: Connecting macroscopic properties of substances with their molecular

    properties

  • Amino acid physicochemical properties

    Nonpolar(Hydrophobic)

    ACFGILMPVW

    Polar (hydrophilic): NQSTY

    Aromatic: FHWY (having to do with 6-carbon rings)

    Basic: HKR

    Acidic: DE (See http://www.bio.davidson.edu/courses/genomics/jmol/aatable.html By way of contrast, can anyone think of a non-

    physicochemical property of some amino acids?

    http://www.bio.davidson.edu/courses/genomics/jmol/aatable.htmlhttp://www.bio.davidson.edu/courses/genomics/jmol/aatable.html

  • Aromatic

    Special type of ring-shaped molecule

    Characterized by an unusual stabilizing property

    Aliphatic

    Non-aromatic

  • Amino acid abbrevs.

    G=glycine, P=proline, T=threonine, A=alanine, , but why the following??

    F=phenylalanine

    Y=tyrosine

    N=asparagine

    Q=glutamine

    W=tryptophan

  • Scoring protein sequence alignments

    Simple way:

    Two matching (identical) amino acids score 1

    Two mismatching (non-identical) ones score 0

    Goal: maximize % of matching amino acids

    Works well for very similar sequences

    Example:

    CADQH

    CADPM

    Alignment score=___

  • Scoring protein sequence alignments II

    Simple way ignores degree of similarity better to account for degree of similarity!

    Solution: substitution matrices

    PAM (Accepted Point Mutation, but PAM easier to say than APM) matrix Developed in 1970s by Margaret Dayhoff PAM1 matrix:

    answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?

    Simple way ignores degree of similarity better to account for degree of similarity!

    Solution: substitution matrices

    PAM (Accepted Point Mutation, but PAM easier to say than APM) matrix Developed in 1970s by Margaret Dayhoff PAM1 matrix:

    answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?

  • Scoring protein sequence alignments II

    Substitution matrices PAM (Accepted Point Mutation, but PAM

    easier to say than APM) matrix PAM1 matrix:

    answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?

    PAM2 matrix:

    Not 2%! Rather, 1%, twice What is the difference?

  • Scoring protein sequence alignments II

    Substitution matrices PAM (Accepted Point Mutation, but PAM

    easier to say than APM) matrix PAM1 matrix:

    answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?

    PAM250 matrix:

    Not 250%, obviously Why obviously? It is 1%, repeated 250 times!

  • Scoring protein sequence alignments II

    Substitution matrices PAM (Accepted Point Mutation, but PAM

    easier to say than APM) matrix PAM1 matrix:

    answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?

    PAM250 matrix:

    It is 1%, repeated 250 times!

    BLOSUM matrix is a popular type also

  • Scoring protein sequences: PAM250

    Here is PAM250

    source: http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/PAM250matrix.gif

    CADQH

    CADPM

    Alignment

    score=?

    http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/PAM250matrix.gifhttp://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/PAM250matrix.gif

  • Scoring protein sequences: BLOSUM62 (default in Blast 2.0)

    Source=http://bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/pairwise.html.

  • Why do self substitutions have the highest numbers?

  • Why use PAM, BLOSUM, etc.?

    Sequence similarity is related to evolutionary distance

    Simple base matching (match/not) may work ok for closely related organisms humans and chimps, for example

    Amino acid matching works better as evolutionary distance increases (why?)

    Wed like to be able to assess relatedness of organisms that diverged long ago humans and worms, for example

  • Relatedness Long Ago

    See images.google.com for

    domains of life

    We still are not sure, but the 3-domain system seems likely

    But cladistics demands binary splits, so

    3 domains requires 2 splits, and

    2 domains are more related than the 3rd

  • Why use PAM, BLOSUM(II)

    Organisms that diverged long ago have divergent analogous amino acid sequences

    Since different amino acid substitutions occur at different frequencies we can measure relatedness back farther

    e.g. when the fraction of identical amino acids is surprisingly low

    and the fraction of identical base pairs is even lower

  • Comparing Sequences with PAMs (+ recap)

  • What does PAM mean?

    PAM is considered an acronym for Point Accepted Mutation

    Accepted Point Mutation (original)

    Percent Accepted Mutations

    A point mutation is a substitution of 1 amino acid for another

    An accepted mutation is one that is passed down through the generations

    Will a mutation be accepted if it is helpful? Harmful? Neutral? Helpful in some circumstances, harmful in others?

  • What Does PAM Mean, cont.

    PAM has two meanings

    PAM is a unit of evolutionary time

    PAM is kind of substitution matrix

    (The meanings are related)

  • PAM as a Unit of Time

    A PAM is the amount of evolutionary change resulting in:

    1 amino acid mutation per 100 amino acids

    It is an average over >>100 amino acids

    because mutations have randomness

    After 1 PAM, will an organism have exactly 1% of its amino acids different from what they started out as?

  • PAM, Evolution, and Gaps

    PAM ignores

    Insertions

    Deletions

    Silent nucleotide substitutions (which are?)

    PAM counts a change from A to B and back to A as 2 accepted point mutations

    2 sequences 200 PAMs apart will have about 25% of amino acids the same!

  • PAM Matrices

    They describe substitutability of amino acids, based on empirical evidence

    Empirical = experiential

    The matrices are derived from repositories of actual homologous sequences

    A PAM 1 matrix is geared to best compare 2 sequences that are 1 PAM apart

    A PAM 250 matrix is good for comparing quite diverged sequences

    PAM 250 matrix is standard

  • Creating a PAM Matrix

    Let fi be the frequency of amino acid i

    We express fi as a fraction of the total

    fi = instances of i __ .

    instances of any amino acid

    Frequencies range from

    0.091 (L) down to 0.014 (W)

    The most common amino acid occurs about ____ times more commonly than the least

  • Creating PAM matrix, cont.

    Determine mutabilities of the amino acids

    Some amino acids tend to change easily

    Others not

    If alanines mutability is set to 100

    Serines mutability is 117 (highest, 1991 data)

    Tryptophans mutability is 25 (lowest, 1991)

    Lets look more closely at mi . . .

  • Creating PAM matrix, cont.

    Mutability is a number

    Given an evolutionary interval of 1 PAM

    let mi = # mutations of amino acid i

    # instances of amino acid i

    Alternatively,

    mi = p (an instance of i mutates)

  • Are the formulas on the previous slide identical?

  • Creating PAM matrix, cont.

    Next, we break mi into constituent mi,js

    That is, i mutates, but into j at what rate?

    Use actual data from observed mutations

    Populate a matrix of probabilities

  • The Diagonal

    Values on the matrix diagonal do not really describe i mutating into itself!

    (In reality, can that happen?)

    They basically show

    p (i does not mutate)

    Thus, the columns add up to 1

  • Is the matrix on the last slide

    Symmetric? Are there about 1% changed?

  • PAM0

    What do you think a PAM 0 matrix might look like?

  • PAMn

    Use matrix multiplication

    PAM2 = PAM1 x PAM1

    PAM3 = PAM2 x PAM1

    PAM250? Do it 250 times!

  • PAM

    What do you imagine a PAM matrix might look sort of like?

  • Logarithmicize

    Actually, we take logarithms to get the usual matrix from the probability matrices

    First, build another, reference matrix of expected probabilities

    Assume all amino acids are equally mutable

    Also assume they mutate into each other in proportion to their frequencies

    (I.e., overall amino acid frequencies are maintained, but otherwise they dont care what they mutate into)

  • Logarithmicize

    Now we have two matrices

    Make a 3rd. Each entry is: Observed probability

    Expected probability

    were comparing reality to if mutations were truly random

    Take the log of each entry to make a 4th An entry of 1 means 10x more mutations of that

    type than expected

    An entry of -1 means what?

  • Carrying On

    We now use the matrix to measure relative evolutionary distance