genetics i (prokaryotes) it carlow bioinformatics september 2006

Post on 02-Jan-2016

229 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genetics I (prokaryotes)

IT Carlow Bioinformatics

September 2006

Biochemistry

• How biology works

• Mechanisms

Genetics

• How things are inherited

• Why you are like your parents but also different

• Where genes, pathways, wings, flippers come from

• How things develop from zygote

• But it’s all molecular biology nowadays

Genetics

• The interesting stuff• “Nothing in biology makes sense except in

the light of evolution” Dobzhansky• “Nothing in bioinformatics makes sense except in

the light of evolution” Higgs & Attwood• Evolution = change in gene frequency over time• What is gene? What is frequency? What is

change? What is time? What is life?

Genome size and differences

• Species Genome size BP Genome size genes

• Human 3,000,000,000 25,000• Yeast 16,000,000 6,500• E.coli 4,000,000 4,000• Gene = 1000bp = 300AA• All descended from LUCA• How?

DNA

• Double helix– 10Å radius (1 or better 1.2nm)– 34Å for single turn– 3.4Å for single base (0.34nm)– 10 bp per turn

• E.coli 4Mb how many Å3/nm3 of DNA?

• Size of E.coli? About 1x2m

• Thinking exercise: % E.coli vol is DNA?

Mutation

• DNA damage from UV light, coal-tar

• Replicative failure (DNApol is good but..)

• Humans 2.5 *10-8/bp/cell div

• E.coli 1*10-7/bp/div

• Humans/Chimps 1% diff but 35m diffs

• You have 1014 cells now from start of 1

Bases

Purines R bigPurines Are biG

Pyrimidines YCUT tinY

A

CT

G

Base pairs

Weak

Strong

Tm!

Mutation 2

• E.coli has mutations• Humans have somatic and germline mutations• Point mutation

– missense • transition R – R, Y – Y, C – T, G – A

• transversion R – Y A-T C-G C-A

– nonsense TGA, TAG, TAA

– Non-coding• Splice, 5’ 3’, Intron

Mutation 3

• Insertions and deletions– One bp is sometimes called “point”– Frameshift– ATGCCCTGCAATGAC– ATGCCCCTGCAATGAC

Ooops

Methylation of C

Mutations 4• Chromosomal rearrangement

– Inversion– Translocation

• Chromosome copy – Aneuploidy (Down’s)– Polyploidy (tetraploid)– Whole genome duplication WGD

• Mutational hotspots• Repeats GCGCGCGCGC slip = microsatellites

Genetic codeThe “Universal” Genetic Code.

Phe UUU Ser UCU Tyr UAU Cys UGU UUC UCC UAC UGC Leu UUA UCA ter UAA ter UGA UUG UCG ter UAG Trp UGG

Leu CUU Pro CCU His CAU Arg CGU CUC CCC CAC CGC CUA CCA Gln CAA CGA CUG CCG CAG CGG

Ile AUU Thr ACU Asn AAU Ser AGU AUC ACC AAC AGC AUA ACA Lys AAA Arg AGA Met AUG ACG AAG AGG

Val GUU Ala GCU Asp GAU Gly GGU GUC GCC GAC GGC GUA GCA Glu GAA GGA GUG GCG GAG GGG

 

Willie Taylor’s AAs

Mutations 5

• Synonymous usually 3rd base

• Non-synonymous– Conservative AAA – AGA Lys - Arg– Radical AAA – UAU Lys - Tyr

• CpG methylation mutational hotspot

• CpG islands 5’ mamm housekeeping genes

Mutations Quiz

Exon Intron

5’ 3’

AT C GU

Which mutations AUTCG are most likely to be baaaad?

Mutations & evolution

• Most bacteria have a characteristic mutational bias.

• This will give a species specific G+C ratio– E.coli 50%– B.subtilis 40%– Extreme Mycoplasma, Micrococcus

• Many bacteria have strand bias because the Okazaki enzymes have a different bias

• Hi GC and Lo GC gram positive.

Quiz “answers”

• Location Rate (subst/site/year*10-9)

• 5’ 2.36

• Synon 4.65

• NonSyn 0.88

• Intron 3.7

• 3’ 4.46 4.85

Synon Not neutral?

(Pseudogene)

Substitution

• A mutation that’s been sieved by selection

• Selection is a population/probability term

• Probability that a mutation will a) survive?b) become polymorphism? c) replace existing?

• Depends on population size

Bacterial genes/genomes• E.coli about 4000 genes, 4 Mbases• Tightly packed, usually no overlap

– Viruses ++ tightly packed, overlapping genes

• Origin of replication– Usually near dnaA

• DNA polymerase– Binds and copies– Needs gyrase, helicase etc.– 5’-3’ strand = read through– 3’-5’ strand read in chunks: Okazaki fragments

Operons• Jacob and Monod (and Lwoff)

• Lac operon

• lacZ lacY lacA induced and transcribed together

• lacI adjacent but separate transcript

• MolBiol? Measure mRNA levels, -gal

• Evol? Co-transcription for better control

z y aopi

Odd operons

• Easy explanation when only E. coli and B. subtilis available

• But M. jannaschii (first archaea sequenced)– Linked, cotranscribed but biochemically mad

• Fallout from genome sequencing

• tRNA complement informs expression

Bioinformatic consequences• RNA polymerase needs binding site

• Promoter site upstream from transcrip start

• -35 -10

TTGACANNNNNNNNNNNNNNNNNTATATT• Site directed mutagenesis can parse the info

• Remember lacZ,Y,A cotranscribed

• Then 3’ trailer after last stop codon

• Try to think of 3-D picture

Gene structure

• Upstream control regions

• Start codon

• Open Reading Frame (ORF)

• Stop codon UGA UAG UAA

• 3’ downstream

• So gene prediction is “easy”

Consequences• This view of how the process works

– Colours our view of sequences

• Central dogma:– DNA makes RNA makes PROTEIN makes

everything else

• RNA makes DNA means inheritance of acquired characteristics (Lamarck).

• Leads to a particular definition of “gene”

Translation• Transcription gives you mRNA• Translation gives you protein• In bacteria transcrp transl simultaneous• Ribosome – complex (cottageloaf) of two

subunits 50S and 30S = 70S• 30S 21 proteins rpsX and 16S RNA• 50S 34 proteins rplX and 23S+5S RNA• Needs tRNA, mRNA• Ribosome binding site RBS upstream from

ATG

Summary

• What we know about the genetics can help us identify genes bioinformatically– DNA signatures (RBS, Promoter)– Start - ORF - stop pattern– Consistent codon usage

• Have we predicted a real gene?– Is it present as mRNA?

top related