genetics ii (eukaryotes) it carlow bioinformatics september 2006

22
Genetics II (eukaryotes) IT Carlow Bioinformatics September 2006

Upload: lilian-nicholson

Post on 26-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Genetics II (eukaryotes)

IT Carlow Bioinformatics

September 2006

Homo sapiens• That’s us.

• 3.1 Gbases, 25,000 genes

• Genetic code same as E.coli– Hence “universal”

• DNA replication (DNApol)

• Transcription RNApol

• Ribosomes, translation

• So “essentially” the same?

Other Eukaryotes• Mouse, Rat, Cow, Chimp etc.

– Chimp human 5mya L.C.Ancestor– Mouse rat 30mya LC Ancestor– Mouse human 100mya LC Ancestor– Chicken human 300mya LC Ancestor

• C.elegans 19,000 genes, 300 cells, 97Mbase

• Drosophila 14,000genes, 180Mbase

• S. cerevisiae 6,000genes, 12Mbase

Eukaryotes have nucleus

• DNA bundled in discrete units – chromosomes

– Ends need capping, telomerase issues

• Bundling = additional access complications– histones, supercoiling

• Nucleus forces decoupling transscr translat

• Two way traffic in/out nucleus -– NFB - Transcriptional regulators

Operons?

• In general not.• But yeast often has common promoters on

divergent (opp strand) genes

• Singer Lloyd Humniecki Wolfe 2005– Find tissue specific clusters – spleen expressed– Chance or “design”– Compare human and mouse cluster breaks

Operons in Mammals?

Telomeres

• Eukaryotic chromosomes are linear

• chromosomes seem to have fixed location.

• Telomeres have characteristic # of repeats– Human TTAGGG, Oxytricha TTTTGGGG

• Chrs get shorter each generation– Priming for Okazaki fragments– Telomerase adds repeats– Telomerase fails: cancer, senescence

How similar is the machinery?

• DNA polymerase size % ID

• RNA polymerase

• Ribosomes – rRNA bigger 5S, 5.8S, 18S and 28S– Bases: 120bp,160bp,1900bp,4700bp– Protein count 50 rplX & 33 rpsX

tRNA

• Essential mediators of translation

• 74-90 base in size clover-leaf stucture

• Anti-codon loop– Curved so “wobble” is possible at third posn –– One anti-codon can serve 2 or 3 codons

• XXG can pair with C … Or U

• XXI (inosine) can pair with A, C or U

Introns

• About 5% of yeast genes

• Most mammalian genes

• Alternative splicing– Explain why we are more complex than worms– Challenges dogma 1 gene = 1 protein– Accounts for 80,000 diff proteins

Intron splice site

Alternative splicing 1

• Splice / don’t splice

• If stop codon in frame in intron then truncated protein.

• Can be used as a genetic switch to control production of two alternative proteins

Alternative splicing 2

• Competing 5' or 3' Splice Site

• Here two different 3’ splice sites

• Proximal, distal

Alternative splicing 3

• Exon skipping

• Could be more than one exon skipped

• Lots of potential for variant transcripts

• Slightly different enzymes

• Missing protein domains

Alternative splicing 4

• Mutually exclusive exons

• Here exons 1, 2, & 4 or 1, 3, & 4

• Two different forms of protein

Alternative splicing 5

• That’s just 1 classification– Can you think of another?

• Binf consequences– Gene prediction difficult in eukaryotes– No one answer in any one case– EST as binf tool for prediction

Junk?

• Human genome 3Gb but only 25K genes

• Even when introns accounted for

• 3% genome coding for “genes”

• 1% is actual codons

• The rest?

Pseudogenes

• Defined as gene inactivated because of mutation– Most obviously by nonsense/stop codon mutation

– Genetic code arranged so many mutations tolerable

– Once inactivated more mutations accumulate

• Processed pseudogene – Reverse transcriptase copy of mRNA

– Lacks introns, 5’ upstream control regions

• 1/3rd of human genome gene and gene related– pseudogenes,

– gene fragments, truncated genes

– introns/UTRs

Repetitive elements• 2/3rd of genome “intergenic”

– 1400Mb interspersed repeats (transposable elements) 44% of genome

• 640Mb LINES, LINE-1

• 420Mb SINES, Alu million copies

• 250Mb LTR, ERV 200,000 copies

• 90Mb DNA transposons, PiggyBac 2000 copies

– 600Mb Microsatellites etc.• 90Mb CACACA and other repeats (forensics)

A bit of history

• Darwin Origin of Species

• 1860s Mendel sends ms to Darwin (ignores)

• 1909 Gene “invented”

• 1910 Genes sit on chromosomes, in order

• 1941 One gene = one enzyme

• 1944 Genes definitely DNA

• 1953 Double helix

• 1977 Splicing

• 1993 MicroRNA identified

What is a gene?• Nature 25 May 2006 News Feature p399-401• Plants (Hothead), now mice may hold RNA copy

of gene to “correct” DNA!• ENCODE project Encyclopedia of DNA elements

– Close look at 1% of human genome

• Alternative splicing (1977) can be fitted in.• 5% of genome transcribed as read-through!• Exons can combine with exons many genes away!• 63% of mouse genome transcribed!• 8/500 non-coding RNAs essential for signalling

and growth

Bioinformatic consequences

• Pseudogenes a bioinf problem– Transcribed? See ESTs

• Alternative splicing a gene prediction prob– Exon prediction “easy”– Gene prediction harder

• Careers in RNA bioinformatics.