sequence analysis: what is a sequence? linear arrangement of chemical subunits contains information:...

30
Sequence analysis: what is a sequence? Linear arrangement of chemical subunits Contains information: 3-D arrangement determined by the sequence; 3-D defines function

Upload: julia-arnold

Post on 27-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Sequence analysis: what is a sequence?Sequence analysis: what is a sequence?• Linear arrangement of chemical

subunits

• Contains information: 3-D arrangement determined by the sequence; 3-D defines function

• Linear arrangement of chemical subunits

• Contains information: 3-D arrangement determined by the sequence; 3-D defines function

GeneGene

• Confers some trait : i.e., unit of information

• How is this information used?– passed on to next

generation– put into a form useful

for doing cellular work

• Specific sequence of DNA (a molecule)

• In vivo, found in one specific place

• Confers some trait : i.e., unit of information

• How is this information used?– passed on to next

generation– put into a form useful

for doing cellular work

• Specific sequence of DNA (a molecule)

• In vivo, found in one specific place

Nucleic acid sequences store informationNucleic acid sequences store information• Linear arrangement of

chemical subunits; chemistry confers direction

• 3-D arrangement determined by sequence; 3-D arrangement defines function

• DNA: subunits = nucleotides A,T,G,C

• Linear arrangement of chemical subunits; chemistry confers direction

• 3-D arrangement determined by sequence; 3-D arrangement defines function

• DNA: subunits = nucleotides A,T,G,C

Sources of sequence informationSources of sequence information• Chemical reactions on polymers

(sequential degradation into monomers)

• Translation (more later)

• Chemical reactions on polymers (sequential degradation into monomers)

• Translation (more later)

Sequencing gel -- primary source of sequence dataSequencing gel -- primary source of sequence data• Automated sequencing uses Sanger

method (also called dideoxy or enzymatic method)

• Relies on enzyme DNA polymerase and chemically modified nucleotides = dideoxynucleotides (ddNMP)

• When ddNMP is added to growing DNA chain, chain stops; fix [ddNMP] such that chain stops once every occurrence of A,T,G or C

• Automated sequencing uses Sanger method (also called dideoxy or enzymatic method)

• Relies on enzyme DNA polymerase and chemically modified nucleotides = dideoxynucleotides (ddNMP)

• When ddNMP is added to growing DNA chain, chain stops; fix [ddNMP] such that chain stops once every occurrence of A,T,G or C

Diagram of sequencing gelDiagram of sequencing gel

Traces of sequencing gelTraces of sequencing gel

In-class exercise I: nucleic acid polymerIn-class exercise I: nucleic acid polymer• 1) draw chemical bonds of sequence

AGTCAGTC

• 2) predict complementary sequence

• 3) sketch 3-D structure

• 4) view sequence of actual gene in GCG

• 5) view 3-D structure in file

• 1) draw chemical bonds of sequence AGTCAGTC

• 2) predict complementary sequence

• 3) sketch 3-D structure

• 4) view sequence of actual gene in GCG

• 5) view 3-D structure in file

Protein sequences store informationProtein sequences store information• Directional sequence

of subunits = amino acids, 20 of them abbreviated as letters

• Function depends on structure depends on sequence

• Proteins (enzymes) do the work of life; work defined by sequence

• Directional sequence of subunits = amino acids, 20 of them abbreviated as letters

• Function depends on structure depends on sequence

• Proteins (enzymes) do the work of life; work defined by sequence

5 10 15 20 25 30

1 A A S X D X S L V E V H X X V F I V P P X I L Q A V V S I A

31 T T R X D D X D S A A A S I P M V P G W V L K Q V X G S Q A

61 G S F L A I V M G G G D L E V I L I X L A G Y Q E S S I X A

91 S R S L A A S M X T T A I P S D L W G N X A X S N A A F S S

121 X E F S S X A G S V P L G F T F X E A G A K E X V I K G Q I

151 T X Q A X A F S L A X L X K L I S A M X N A X F P A G D X X

181 X X V A D I X D S H G I L X X V N Y T D A X I K M G I I F G

211 S G V N A A Y W C D S T X I A D A A D A G X X G G A G X M X

241 V C C X Q D S F R K A F P S L P Q I X Y X X T L N X X S P X

271 A X K T F E K N S X A K N X G Q S L R D V L M X Y K X X G Q

301 X H X X X A X D F X A A N V E N S S Y P A K I Q K L P H F D

331 L R X X X D L F X G D Q G I A X K T X M K X V V R R X L F L

361 I A A Y A F R L V V C X I X A I C Q K K G Y S S G H I A A X

391 G S X R D Y S G F S X N S A T X N X N I Y G W P Q S A X X S

421 K P I X I T P A I D G E G A A X X V I X S I A S S Q X X X A

451 X X S A X X A

A protein sequence

So what?

Flow of molecular informationFlow of molecular information

In-class exercise II: translationIn-class exercise II: translation• Given the RNA sequence

UUUUGUAGACUUCAUCGACCC predict the amino acid sequence coded for.

• Given the RNA sequence UUUUGUAGACUUCAUCGACCC predict the amino acid sequence coded for.

In-class exercise III: protein chemistryIn-class exercise III: protein chemistry

• Draw chemical bonds of protein sequence KRETWA

• Draw chemical bonds of protein sequence KRETWA

Information Theory Primer – Tom SchneiderInformation Theory Primer – Tom Schneider• http://www-lmmb.ncifcrf.gov/~toms/

paper/primer/latex/index.html• http://www-lmmb.ncifcrf.gov/~toms/

paper/primer/latex/index.html

Evolution -- general principlesEvolution -- general principles• Individuals in a population of any species

vary in many heritable traits• Any population of a species has the

potential to produce far more offspring than the environment can support; this leads to competition.

• Individuals with traits favorable to winning the competition will reproduce more, leading to higher representation of such traits in the population. (Natural selection)

• Individuals in a population of any species vary in many heritable traits

• Any population of a species has the potential to produce far more offspring than the environment can support; this leads to competition.

• Individuals with traits favorable to winning the competition will reproduce more, leading to higher representation of such traits in the population. (Natural selection)

Genetics and evolutionGenetics and evolution

• Evolution happens in populations, not in individuals

• Variability seen in populations is a result of genetics; especially sexual recombination

• Variability of populations is nonlinear

• Evolution happens in populations, not in individuals

• Variability seen in populations is a result of genetics; especially sexual recombination

• Variability of populations is nonlinear

Molecular evolutionMolecular evolution

• DNA changes lead to protein changes• Protein changes can lead to new functions• Molecular changes are linear:

accumulation of mutations over time• Mixing of different forms of molecules =

sexual recombination; but sexual recombination does not affect the molecules

• DNA changes lead to protein changes• Protein changes can lead to new functions• Molecular changes are linear:

accumulation of mutations over time• Mixing of different forms of molecules =

sexual recombination; but sexual recombination does not affect the molecules

?

???

Victoria, Queen of England

grandson Alexis, TsarevichOf Russia Alfonso, Crown

Prince of Spain

Present BritishRoyals (unaffected)

Prince Albert

King Edward 7

Duke Leopold Princess Alice

PrincessBeatrice

Tracing Hemophilia in the Royal Houses Of Europe

Molecular evolution: what changes actually happen?Molecular evolution: what changes actually happen?• Substitutions

• Deletions, insertions

• Rearrangements (inversions, transpositions)

• Repeats

• Substitutions

• Deletions, insertions

• Rearrangements (inversions, transpositions)

• Repeats

SubstitutionsSubstitutions

ACCTGAACTTTACCT ACCTGAAATTTACCT

Insertions/deletionsInsertions/deletions

ACCTGAACTTACCT

ACCTGAAACCT

ACCTGAA---ACCT

RearrangementsRearrangements

INVERSION: ACCTGAACTTACCT

ACCTGAATTCACCT

RepeatsRepeats

ACCTGAACTTACCT

ACCTGAACTTCTTACCT

Similarity ISimilarity I

• Quantifiable attribute: e.g., % identity or alignment score

• Evolutionarily related regions will be similar in some measurable way (structure or sequence); similar regions are not necessarily evolutionarily related

• Quantifiable attribute: e.g., % identity or alignment score

• Evolutionarily related regions will be similar in some measurable way (structure or sequence); similar regions are not necessarily evolutionarily related

Similarity IISimilarity II

• High degrees of sequence similarity (30% identity) indicate evolutionary relationship; intermediate degrees of sequence similarity (15% identity) don’t necessarily; evolutionarily related molecules may show low degrees of sequence similarity (but high structural similarity)

• High degrees of sequence similarity (30% identity) indicate evolutionary relationship; intermediate degrees of sequence similarity (15% identity) don’t necessarily; evolutionarily related molecules may show low degrees of sequence similarity (but high structural similarity)

Homology Homology

• Homology is the conclusion from similarity data that structures and/or sequences share a common evolutionary pathway

• Divergence from a common ancestor via substitutions, deletions, insertions, etc.

• Conserved regions indicate sequences/structures important to function

• Homology is the conclusion from similarity data that structures and/or sequences share a common evolutionary pathway

• Divergence from a common ancestor via substitutions, deletions, insertions, etc.

• Conserved regions indicate sequences/structures important to function

Modular nature of proteinsModular nature of proteins

• Many proteins are modular: some regions have one evolutionary pathway, others have another; the different regions interact to form a new function

• Example: NOS

• Many proteins are modular: some regions have one evolutionary pathway, others have another; the different regions interact to form a new function

• Example: NOS

NOS modular structureNOS modular structure

PDZ domain

Oxygenase domain

CaM site

FMN binding domain

45 amino acid

insert

FAD binding domain

NADPH binding domain

Caveats to homologyCaveats to homology• Very closely related species might not have had time to

diverge -- high similarity doesn’t indicate importance to function

• evolutionary relationships evident in sequence give history, but not always relevant to current function

• convergent evolution: similar form but not same pathway

• Convergent evolution of active sites common – cytochrome P450, chlorooxygenase, NOS active site

• Convergent evolution of protein sized sequence astronomically unlikely- We’ll get a taste of this when we do BLAST and Karlin-Altschul statistics

• Very closely related species might not have had time to diverge -- high similarity doesn’t indicate importance to function

• evolutionary relationships evident in sequence give history, but not always relevant to current function

• convergent evolution: similar form but not same pathway

• Convergent evolution of active sites common – cytochrome P450, chlorooxygenase, NOS active site

• Convergent evolution of protein sized sequence astronomically unlikely- We’ll get a taste of this when we do BLAST and Karlin-Altschul statistics