sequence analysis: what is a sequence? linear arrangement of chemical subunits contains information:...
TRANSCRIPT
Sequence analysis: what is a sequence?Sequence analysis: what is a sequence?• Linear arrangement of chemical
subunits
• Contains information: 3-D arrangement determined by the sequence; 3-D defines function
• Linear arrangement of chemical subunits
• Contains information: 3-D arrangement determined by the sequence; 3-D defines function
GeneGene
• Confers some trait : i.e., unit of information
• How is this information used?– passed on to next
generation– put into a form useful
for doing cellular work
• Specific sequence of DNA (a molecule)
• In vivo, found in one specific place
• Confers some trait : i.e., unit of information
• How is this information used?– passed on to next
generation– put into a form useful
for doing cellular work
• Specific sequence of DNA (a molecule)
• In vivo, found in one specific place
Nucleic acid sequences store informationNucleic acid sequences store information• Linear arrangement of
chemical subunits; chemistry confers direction
• 3-D arrangement determined by sequence; 3-D arrangement defines function
• DNA: subunits = nucleotides A,T,G,C
• Linear arrangement of chemical subunits; chemistry confers direction
• 3-D arrangement determined by sequence; 3-D arrangement defines function
• DNA: subunits = nucleotides A,T,G,C
Sources of sequence informationSources of sequence information• Chemical reactions on polymers
(sequential degradation into monomers)
• Translation (more later)
• Chemical reactions on polymers (sequential degradation into monomers)
• Translation (more later)
Sequencing gel -- primary source of sequence dataSequencing gel -- primary source of sequence data• Automated sequencing uses Sanger
method (also called dideoxy or enzymatic method)
• Relies on enzyme DNA polymerase and chemically modified nucleotides = dideoxynucleotides (ddNMP)
• When ddNMP is added to growing DNA chain, chain stops; fix [ddNMP] such that chain stops once every occurrence of A,T,G or C
• Automated sequencing uses Sanger method (also called dideoxy or enzymatic method)
• Relies on enzyme DNA polymerase and chemically modified nucleotides = dideoxynucleotides (ddNMP)
• When ddNMP is added to growing DNA chain, chain stops; fix [ddNMP] such that chain stops once every occurrence of A,T,G or C
In-class exercise I: nucleic acid polymerIn-class exercise I: nucleic acid polymer• 1) draw chemical bonds of sequence
AGTCAGTC
• 2) predict complementary sequence
• 3) sketch 3-D structure
• 4) view sequence of actual gene in GCG
• 5) view 3-D structure in file
• 1) draw chemical bonds of sequence AGTCAGTC
• 2) predict complementary sequence
• 3) sketch 3-D structure
• 4) view sequence of actual gene in GCG
• 5) view 3-D structure in file
Protein sequences store informationProtein sequences store information• Directional sequence
of subunits = amino acids, 20 of them abbreviated as letters
• Function depends on structure depends on sequence
• Proteins (enzymes) do the work of life; work defined by sequence
• Directional sequence of subunits = amino acids, 20 of them abbreviated as letters
• Function depends on structure depends on sequence
• Proteins (enzymes) do the work of life; work defined by sequence
5 10 15 20 25 30
1 A A S X D X S L V E V H X X V F I V P P X I L Q A V V S I A
31 T T R X D D X D S A A A S I P M V P G W V L K Q V X G S Q A
61 G S F L A I V M G G G D L E V I L I X L A G Y Q E S S I X A
91 S R S L A A S M X T T A I P S D L W G N X A X S N A A F S S
121 X E F S S X A G S V P L G F T F X E A G A K E X V I K G Q I
151 T X Q A X A F S L A X L X K L I S A M X N A X F P A G D X X
181 X X V A D I X D S H G I L X X V N Y T D A X I K M G I I F G
211 S G V N A A Y W C D S T X I A D A A D A G X X G G A G X M X
241 V C C X Q D S F R K A F P S L P Q I X Y X X T L N X X S P X
271 A X K T F E K N S X A K N X G Q S L R D V L M X Y K X X G Q
301 X H X X X A X D F X A A N V E N S S Y P A K I Q K L P H F D
331 L R X X X D L F X G D Q G I A X K T X M K X V V R R X L F L
361 I A A Y A F R L V V C X I X A I C Q K K G Y S S G H I A A X
391 G S X R D Y S G F S X N S A T X N X N I Y G W P Q S A X X S
421 K P I X I T P A I D G E G A A X X V I X S I A S S Q X X X A
451 X X S A X X A
A protein sequence
So what?
In-class exercise II: translationIn-class exercise II: translation• Given the RNA sequence
UUUUGUAGACUUCAUCGACCC predict the amino acid sequence coded for.
• Given the RNA sequence UUUUGUAGACUUCAUCGACCC predict the amino acid sequence coded for.
In-class exercise III: protein chemistryIn-class exercise III: protein chemistry
• Draw chemical bonds of protein sequence KRETWA
• Draw chemical bonds of protein sequence KRETWA
Information Theory Primer – Tom SchneiderInformation Theory Primer – Tom Schneider• http://www-lmmb.ncifcrf.gov/~toms/
paper/primer/latex/index.html• http://www-lmmb.ncifcrf.gov/~toms/
paper/primer/latex/index.html
Evolution -- general principlesEvolution -- general principles• Individuals in a population of any species
vary in many heritable traits• Any population of a species has the
potential to produce far more offspring than the environment can support; this leads to competition.
• Individuals with traits favorable to winning the competition will reproduce more, leading to higher representation of such traits in the population. (Natural selection)
• Individuals in a population of any species vary in many heritable traits
• Any population of a species has the potential to produce far more offspring than the environment can support; this leads to competition.
• Individuals with traits favorable to winning the competition will reproduce more, leading to higher representation of such traits in the population. (Natural selection)
Genetics and evolutionGenetics and evolution
• Evolution happens in populations, not in individuals
• Variability seen in populations is a result of genetics; especially sexual recombination
• Variability of populations is nonlinear
• Evolution happens in populations, not in individuals
• Variability seen in populations is a result of genetics; especially sexual recombination
• Variability of populations is nonlinear
Molecular evolutionMolecular evolution
• DNA changes lead to protein changes• Protein changes can lead to new functions• Molecular changes are linear:
accumulation of mutations over time• Mixing of different forms of molecules =
sexual recombination; but sexual recombination does not affect the molecules
• DNA changes lead to protein changes• Protein changes can lead to new functions• Molecular changes are linear:
accumulation of mutations over time• Mixing of different forms of molecules =
sexual recombination; but sexual recombination does not affect the molecules
?
???
Victoria, Queen of England
grandson Alexis, TsarevichOf Russia Alfonso, Crown
Prince of Spain
Present BritishRoyals (unaffected)
Prince Albert
King Edward 7
Duke Leopold Princess Alice
PrincessBeatrice
Tracing Hemophilia in the Royal Houses Of Europe
Molecular evolution: what changes actually happen?Molecular evolution: what changes actually happen?• Substitutions
• Deletions, insertions
• Rearrangements (inversions, transpositions)
• Repeats
• Substitutions
• Deletions, insertions
• Rearrangements (inversions, transpositions)
• Repeats
Similarity ISimilarity I
• Quantifiable attribute: e.g., % identity or alignment score
• Evolutionarily related regions will be similar in some measurable way (structure or sequence); similar regions are not necessarily evolutionarily related
• Quantifiable attribute: e.g., % identity or alignment score
• Evolutionarily related regions will be similar in some measurable way (structure or sequence); similar regions are not necessarily evolutionarily related
Similarity IISimilarity II
• High degrees of sequence similarity (30% identity) indicate evolutionary relationship; intermediate degrees of sequence similarity (15% identity) don’t necessarily; evolutionarily related molecules may show low degrees of sequence similarity (but high structural similarity)
• High degrees of sequence similarity (30% identity) indicate evolutionary relationship; intermediate degrees of sequence similarity (15% identity) don’t necessarily; evolutionarily related molecules may show low degrees of sequence similarity (but high structural similarity)
Homology Homology
• Homology is the conclusion from similarity data that structures and/or sequences share a common evolutionary pathway
• Divergence from a common ancestor via substitutions, deletions, insertions, etc.
• Conserved regions indicate sequences/structures important to function
• Homology is the conclusion from similarity data that structures and/or sequences share a common evolutionary pathway
• Divergence from a common ancestor via substitutions, deletions, insertions, etc.
• Conserved regions indicate sequences/structures important to function
Modular nature of proteinsModular nature of proteins
• Many proteins are modular: some regions have one evolutionary pathway, others have another; the different regions interact to form a new function
• Example: NOS
• Many proteins are modular: some regions have one evolutionary pathway, others have another; the different regions interact to form a new function
• Example: NOS
NOS modular structureNOS modular structure
PDZ domain
Oxygenase domain
CaM site
FMN binding domain
45 amino acid
insert
FAD binding domain
NADPH binding domain
Caveats to homologyCaveats to homology• Very closely related species might not have had time to
diverge -- high similarity doesn’t indicate importance to function
• evolutionary relationships evident in sequence give history, but not always relevant to current function
• convergent evolution: similar form but not same pathway
• Convergent evolution of active sites common – cytochrome P450, chlorooxygenase, NOS active site
• Convergent evolution of protein sized sequence astronomically unlikely- We’ll get a taste of this when we do BLAST and Karlin-Altschul statistics
• Very closely related species might not have had time to diverge -- high similarity doesn’t indicate importance to function
• evolutionary relationships evident in sequence give history, but not always relevant to current function
• convergent evolution: similar form but not same pathway
• Convergent evolution of active sites common – cytochrome P450, chlorooxygenase, NOS active site
• Convergent evolution of protein sized sequence astronomically unlikely- We’ll get a taste of this when we do BLAST and Karlin-Altschul statistics