we have talked about chromosome organization, what about genome organization?

83
We have talked about chromosome organization, what about genome organization?

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: We have talked about chromosome organization, what about genome organization?

We have talked about

chromosome organization,

what about genome organization?

Page 2: We have talked about chromosome organization, what about genome organization?

Eukaryotic genomes are complex and DNA amounts and organization

vary widely between species.

Page 3: We have talked about chromosome organization, what about genome organization?

• C value paradox: the amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes.

Page 4: We have talked about chromosome organization, what about genome organization?
Page 5: We have talked about chromosome organization, what about genome organization?
Page 6: We have talked about chromosome organization, what about genome organization?

• There are different classes of eukaryotic DNA based on sequence complexity.

Page 7: We have talked about chromosome organization, what about genome organization?

Reassociation Kinetics

Page 8: We have talked about chromosome organization, what about genome organization?

3 Main Components in Eukaryotic Genomes

Page 9: We have talked about chromosome organization, what about genome organization?
Page 10: We have talked about chromosome organization, what about genome organization?

The human genome

- Two versions of human genome sequences were published in February 2001. DNA sequences that encode proteins make up only 5% of the genome

- ~50% sequences are transposable elements; clusters of gene-rich regions are separated by gene deserts

- CH 19 has the highest gene density, CH 13 & Y show the lowest gene density

Page 11: We have talked about chromosome organization, what about genome organization?
Page 12: We have talked about chromosome organization, what about genome organization?

The human genome

-Gene total estimated 30,000-40,000, w/ an average gene size of 27 Kb

- Hundreds of genes share homology w/ those of bacteria

- The number of introns vary greatly (from 0 for histone to 234 for titin)

Page 13: We have talked about chromosome organization, what about genome organization?

The human genome

-Genes larger & contain more and larger introns compared to these in invertebrates (dystrophin gene is 2.5 Mb)

- Genes are not evenly spaced on CHs - The most common genes include those:

involved in nucleic acid metabolism-7.5%; receptors-5%; protein kinases-2.8% & cytoskeletal structural proteins-2.8%

Page 14: We have talked about chromosome organization, what about genome organization?
Page 15: We have talked about chromosome organization, what about genome organization?

Genome organization in plants

- Size of genome varies widely (100 Mb-5,500 Mb)

- Many tandem gene duplications & larger duplications; some interchromosomal duplications also observed

- Large-genome plants also have genes clustered with long stretches of intergenic DNA

- In maize, the intergenic sequences are composed mainly of transposons

Page 16: We have talked about chromosome organization, what about genome organization?
Page 17: We have talked about chromosome organization, what about genome organization?
Page 18: We have talked about chromosome organization, what about genome organization?

Single Copy Sequences

Page 19: We have talked about chromosome organization, what about genome organization?

Genes can be difficult to identify/predict.

Why?

Page 20: We have talked about chromosome organization, what about genome organization?

The human genome turns out to have only about half or fewer

(30,000 to 40,000) genes than we predicted (100,000). Why?

Drosophila – 13,000Nematode – 19,000

Page 21: We have talked about chromosome organization, what about genome organization?
Page 22: We have talked about chromosome organization, what about genome organization?
Page 23: We have talked about chromosome organization, what about genome organization?
Page 24: We have talked about chromosome organization, what about genome organization?

Problems?

• It is more complicated than that.

• Some gene products are RNA (tRNA, rRNA, others) instead of protein

• Some nucleic acid sequences that do not encode gene products (noncoding regions) are necessary for production of the gene product (protein or RNA).

Page 25: We have talked about chromosome organization, what about genome organization?

Coding region

Page 26: We have talked about chromosome organization, what about genome organization?

Noncoding regions

• Regulatory regions– RNA polymerase binding site– Transcription factor binding sites

• Introns

• Polyadenylation [poly(A)] sites

Page 27: We have talked about chromosome organization, what about genome organization?

Unique genes

Page 28: We have talked about chromosome organization, what about genome organization?
Page 29: We have talked about chromosome organization, what about genome organization?
Page 30: We have talked about chromosome organization, what about genome organization?

Promoters

• Sequences can be quite distant from coding region

Page 31: We have talked about chromosome organization, what about genome organization?

Introns/exons

• Most eukaryotic genes have introns

• Introns are often much longer than exons

• Often many introns

• mRNA much shorter than genomic DNA

• Can vary between the same gene in different species

Page 32: We have talked about chromosome organization, what about genome organization?
Page 33: We have talked about chromosome organization, what about genome organization?
Page 34: We have talked about chromosome organization, what about genome organization?
Page 35: We have talked about chromosome organization, what about genome organization?

Splice Sites

• Eukaryotes only • Removal of internal parts of the newly

transcribed RNA.• Takes place in the cell nucleus• Splice sites difficult to predict

Page 36: We have talked about chromosome organization, what about genome organization?

Alternative splicing

• Different splice patterns from the same sequence, therefore different products from the same gene.

Page 37: We have talked about chromosome organization, what about genome organization?
Page 38: We have talked about chromosome organization, what about genome organization?
Page 39: We have talked about chromosome organization, what about genome organization?

Alternative splicing

• Multiple promoters

• Multiple terminators

• Alternatively spliced introns

• 59% of genes

• Average of ~3 forms

Page 40: We have talked about chromosome organization, what about genome organization?
Page 41: We have talked about chromosome organization, what about genome organization?

Exon Shuffling

Page 42: We have talked about chromosome organization, what about genome organization?

Why genome size doesn’t matter

• More sophisticated regulation of expression?

• Proteome vastly larger than genome?– Alternate splicing– RNA editing

• Postranslational modifications?

• Cellular location?

• Moonlighting

Page 43: We have talked about chromosome organization, what about genome organization?

Gene Identification

• Open reading frames• Sequence conservation

– Database searches– Synteny

• Sequence features– CpG islands

• Evidence for transcription– ESTs, microarrays, SAGE

• Gene inactivation– Transformation, TEs, RNAi

Page 44: We have talked about chromosome organization, what about genome organization?

Open reading frames• 5'                                                   3'

   atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa

 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa     M   P   K   L   N   S   V   E   G   F   S   S   F   E   D   D   V   *   2  tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat      C   P   S   *   I   A   *   R   G   F   H   H   L   R   T   M   Y   3   gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata

      A   Q   A   E   *   R   R   G   V   F   I   I   *   G   R   C   I 

Page 45: We have talked about chromosome organization, what about genome organization?

Database searches

P t C B F 1 M D S L S S W S . . . . . . . S P E S R V G N . . F S D E D V V L L A S S N P K K R A G R K K F R E T R H P V Y R G V R R R . D S G K W V C E L R E P N K K . S R I W L G T F P T A E M A A RL e C B F 1 M N I F E T Y Y S D S L I L T E S S S S S S S S S F S E E . E V I L A S N N P K K P A G R K K F R E T R H P I Y R G I R K R . N S G K W V C E V R E P N K K . T R I W L G T F P T A E M A A RA t C B F 1 1 M N S F S A F S . . . . . . . E M F G S D Y E . P Q G G D Y C P T L A T S C P K K P A G R K K F R E T R H P I Y R G V R Q R . N S G K W V S E V R E P N K K . T R I W L G T F Q T A E M A A RB n C B F 1 M N S V S T F S . . . . . . . E L L G S E N E S P V G G D Y C P M L A A S C P K K P A G R K K F R E T R H P I Y R G V R L R . K S G K W V C E V R E P N K K . S R I W L G T F K T A E I A A RT a C B F 1 M D T A A A G S P . . . . . . . . . . . . . . . . . . R E G H R T V C S E P P K R P A G R T K F R E T R H P L Y R G V R R R G R L G Q W V C E V R V R G A Q G Y R L W L G T F T T A E M A A RS c C B F 1 M D V A D I A S P S . . . . . . . . . . . . . . G Q Q E Q G H R T V S S E P P K R P A G R T K F H E T R H P L Y R G V R R R G R V G Q W V C E V R V P G I K G S R L W L G T F N T A E M A A R

P t C B F 8 5 A H D V A A I A L R G R L A C L N F A D S S W R L P . . L P A S T . . . . D P K D I Q K A A A E A A E A F R P E K . D L R R . . . . . . . . . . . . . . . . . . . . . . . . . . . V D D K M DL e C B F 9 3 A H D V A A L A L R G R S A C L N F S D S A W R L P . . I P A S S . . . . N S K D I Q K A A A Q A V E I F R S E E V S G E S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A t C B F 1 8 6 A H D V A A L A L R G R S A C L N F A D S A W R L R . . I P E S T . . . . C A K D I Q K A A A E A A L A F Q D E T C D T T T . . . . . . . . . . . . . . . . . . . . . . . . . . . T D H G L DB n C B F 8 7 A H D V A A L A L R G R G A C L N F A D S A W R L R . . I P E T T . . . . C A K D I Q K A A A E A A L A F E A E K S D T T T N D . . . . . . . . . . . H G M N M A S Q A E V N D T T D H G L DT a C B F 7 8 A H D S A V L A L L D R A A C L N F A D S A W R M L P V L A A G S S R F S S A R E I K D A V A I A V L E F Q R . . . . . . . . . . . . . . . . . . . . . . . . . . . . Q R . . . . . . P V V SS c C B F 8 2 A H D A A V L A L S C R A A C L N F A D S A W R M L P V L A A G S F G F G S P R E I K A A V A V A V I A F Q R K Q I I P V A V A V V A L Q Q Q Q V P V A V A V V A L K Q K Q V P V A V A V V A

P t C B F 1 4 6 E K E V A E R T T M S D G . . . . . . . . . . . . . . . . . . . . . . . . V I F M D E E A V F G . M P G . . L L T N M A E G M L L P P P P P P Q C N R G G Y E E D D V E S N A . D V S L W S YL e C B F 1 4 9 . P E T S E N V Q E S S D . . . . . . . . . . . . . . . . . . . . . . . . . . F V D E E A I F F . M P G . . L L A N M A E G L M L P P P Q C A E M G D H . . . . . C V E T D A Y M I T L W N YA t C B F 1 1 4 8 M E E T M V E A I Y T P E Q S E G . . . . . . . . . . . . . . . . . . . . A F Y M D E E T M F G . M P T . . L L D N M A E G M L L P P P S . V Q W N H N . . . . Y D G E G D G . D V S L W S YB n C B F 1 6 5 M E E T M V E A V F T E E Q R D G F Y M A E E T T V E G V V P E E Q M S K G F Y M D E E W M F G . M P T . . L L A D M A A G M L L P P P S . V Q W G H N . . . . D D F E G D V . D M N L W N YT a C B F 1 3 9 T S E M H D G E K D A Q G S P . . . . . . . . . . . . T P S E L S T S S D L L D . . . E H W F G G M D A G S Y Y A S L A Q G M L M E P P S A R T W S E D . . . G G E . . Y S A V Y T P L W N .S c C B F 1 7 7 L Q Q L H V P V A V A V V A L Q Q Q Q I I L P V A C L A P E F Y M S S G D L L E L D E E H W F G G M D A G S Y Y A S L A Q G M L V A P P D E R A R P E N . . . G E Q E R R P D A A M E L F V R

P t C B F 2 1 3 S VL e C B F 2 0 9 S IA t C B F 1 2 1 4 . .B n C B F 2 5 1 . .T a C B F . .S c C B F 2 6 9 L I

P t C B F 1 M D S L S S W S . . . . . . . S P E S R V G N . . F S D E D V V L L A S S N P K K R A G R K K F R E T R H P V Y R G V R R R . D S G K W V C E L R E P N K K . S R I W L G T F P T A E M A A RL e C B F 1 M N I F E T Y Y S D S L I L T E S S S S S S S S S F S E E . E V I L A S N N P K K P A G R K K F R E T R H P I Y R G I R K R . N S G K W V C E V R E P N K K . T R I W L G T F P T A E M A A RA t C B F 1 1 M N S F S A F S . . . . . . . E M F G S D Y E . P Q G G D Y C P T L A T S C P K K P A G R K K F R E T R H P I Y R G V R Q R . N S G K W V S E V R E P N K K . T R I W L G T F Q T A E M A A RB n C B F 1 M N S V S T F S . . . . . . . E L L G S E N E S P V G G D Y C P M L A A S C P K K P A G R K K F R E T R H P I Y R G V R L R . K S G K W V C E V R E P N K K . S R I W L G T F K T A E I A A RT a C B F 1 M D T A A A G S P . . . . . . . . . . . . . . . . . . R E G H R T V C S E P P K R P A G R T K F R E T R H P L Y R G V R R R G R L G Q W V C E V R V R G A Q G Y R L W L G T F T T A E M A A RS c C B F 1 M D V A D I A S P S . . . . . . . . . . . . . . G Q Q E Q G H R T V S S E P P K R P A G R T K F H E T R H P L Y R G V R R R G R V G Q W V C E V R V P G I K G S R L W L G T F N T A E M A A R

P t C B F 8 5 A H D V A A I A L R G R L A C L N F A D S S W R L P . . L P A S T . . . . D P K D I Q K A A A E A A E A F R P E K . D L R R . . . . . . . . . . . . . . . . . . . . . . . . . . . V D D K M DL e C B F 9 3 A H D V A A L A L R G R S A C L N F S D S A W R L P . . I P A S S . . . . N S K D I Q K A A A Q A V E I F R S E E V S G E S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A t C B F 1 8 6 A H D V A A L A L R G R S A C L N F A D S A W R L R . . I P E S T . . . . C A K D I Q K A A A E A A L A F Q D E T C D T T T . . . . . . . . . . . . . . . . . . . . . . . . . . . T D H G L DB n C B F 8 7 A H D V A A L A L R G R G A C L N F A D S A W R L R . . I P E T T . . . . C A K D I Q K A A A E A A L A F E A E K S D T T T N D . . . . . . . . . . . H G M N M A S Q A E V N D T T D H G L DT a C B F 7 8 A H D S A V L A L L D R A A C L N F A D S A W R M L P V L A A G S S R F S S A R E I K D A V A I A V L E F Q R . . . . . . . . . . . . . . . . . . . . . . . . . . . . Q R . . . . . . P V V SS c C B F 8 2 A H D A A V L A L S C R A A C L N F A D S A W R M L P V L A A G S F G F G S P R E I K A A V A V A V I A F Q R K Q I I P V A V A V V A L Q Q Q Q V P V A V A V V A L K Q K Q V P V A V A V V A

P t C B F 1 4 6 E K E V A E R T T M S D G . . . . . . . . . . . . . . . . . . . . . . . . V I F M D E E A V F G . M P G . . L L T N M A E G M L L P P P P P P Q C N R G G Y E E D D V E S N A . D V S L W S YL e C B F 1 4 9 . P E T S E N V Q E S S D . . . . . . . . . . . . . . . . . . . . . . . . . . F V D E E A I F F . M P G . . L L A N M A E G L M L P P P Q C A E M G D H . . . . . C V E T D A Y M I T L W N YA t C B F 1 1 4 8 M E E T M V E A I Y T P E Q S E G . . . . . . . . . . . . . . . . . . . . A F Y M D E E T M F G . M P T . . L L D N M A E G M L L P P P S . V Q W N H N . . . . Y D G E G D G . D V S L W S YB n C B F 1 6 5 M E E T M V E A V F T E E Q R D G F Y M A E E T T V E G V V P E E Q M S K G F Y M D E E W M F G . M P T . . L L A D M A A G M L L P P P S . V Q W G H N . . . . D D F E G D V . D M N L W N YT a C B F 1 3 9 T S E M H D G E K D A Q G S P . . . . . . . . . . . . T P S E L S T S S D L L D . . . E H W F G G M D A G S Y Y A S L A Q G M L M E P P S A R T W S E D . . . G G E . . Y S A V Y T P L W N .S c C B F 1 7 7 L Q Q L H V P V A V A V V A L Q Q Q Q I I L P V A C L A P E F Y M S S G D L L E L D E E H W F G G M D A G S Y Y A S L A Q G M L V A P P D E R A R P E N . . . G E Q E R R P D A A M E L F V R

P t C B F 2 1 3 S VL e C B F 2 0 9 S IA t C B F 1 2 1 4 . .B n C B F 2 5 1 . .T a C B F . .S c C B F 2 6 9 L I

Page 46: We have talked about chromosome organization, what about genome organization?

Synteny

Page 47: We have talked about chromosome organization, what about genome organization?

CpG islands

• CpG is subject to methylation, and most eukaryotes (not Drosophila) show less of this nonmethylated dinucleotide than base composition would indicate. Concentrations of CpG may be detected using restriction enzymes whose recognition sequences include CpG.

Page 48: We have talked about chromosome organization, what about genome organization?

CpG islands

• Defined as regions of DNA of at least 200 bp in length that have a G+C content above 50% and a ratio of observed vs. expected CpGs close to or above 0.6.

• Used to help predict gene sequences, especially promoter regions.

Page 49: We have talked about chromosome organization, what about genome organization?
Page 50: We have talked about chromosome organization, what about genome organization?
Page 51: We have talked about chromosome organization, what about genome organization?
Page 52: We have talked about chromosome organization, what about genome organization?

Evidence for Transcription

• cDNAs, ESTs (expressed sequence tags)

• microarrays

Page 53: We have talked about chromosome organization, what about genome organization?

Gene families

• E.g. globins, actin, myosin

• Clustered or dispersed

• Pseudogenes

Page 54: We have talked about chromosome organization, what about genome organization?

Pseudogenes

• Nonfunctional copies of genes

• Formed by duplication of ancestral gene, or reverse transcription (and integration)

• Not expressed due to mutations that produce a stop codon (nonsense or frameshift) or prevent mRNA processing, or due to lack of regulatory sequences

Page 55: We have talked about chromosome organization, what about genome organization?

Duplicated genes

• Encode closely related (homologous) proteins• Formed by duplication of an ancestral gene

followed by mutation

Five functional genes and two pseudogenes

Page 56: We have talked about chromosome organization, what about genome organization?
Page 57: We have talked about chromosome organization, what about genome organization?

Coding sequences less than 5% of the genome!

Page 58: We have talked about chromosome organization, what about genome organization?

Noncoding RNAs

• Do not have translated ORFs

• Small

• Not polyadenylated

Page 59: We have talked about chromosome organization, what about genome organization?

Noncoding RNAs• Transfer RNAs

– < 500

• Ribosomal RNAs– Tandem arrays on several chromosomes

• Small nucleolar RNAs (snoRNAs)– Single genes

• Small nuclear RNAs (snRNAs)– Spliceosomes– Multiple dispersed copies

• Many pseudogenes

Page 60: We have talked about chromosome organization, what about genome organization?
Page 61: We have talked about chromosome organization, what about genome organization?
Page 62: We have talked about chromosome organization, what about genome organization?
Page 63: We have talked about chromosome organization, what about genome organization?
Page 64: We have talked about chromosome organization, what about genome organization?
Page 65: We have talked about chromosome organization, what about genome organization?

• Some noncoding sequences are being found to be highly evolutionarily conserved across diverse species over millions of years. Some of them are in “gene deserts”. They must have a function to be maintained. What is it?

Page 66: We have talked about chromosome organization, what about genome organization?

Repetitive DNA

• Moderately repeated DNA– Tandemly repeated rRNA, tRNA and histone genes

(gene products needed in high amounts)– Large duplicated gene families– Mobile DNA

• Simple-sequence DNA– Tandemly repeated short sequences– Found in centromeres and telomeres (and others)– Used in DNA fingerprinting to identify individuals

Page 67: We have talked about chromosome organization, what about genome organization?

Segmental duplications

• Found especially around centromeres and telomeres

• Often come from nonhomologous chromosomes

• Many can come from the same source

• Tend to be large (10 to 50 kb)

• Unique to humans?

Page 68: We have talked about chromosome organization, what about genome organization?
Page 69: We have talked about chromosome organization, what about genome organization?

Repeat sequences – 50% or more of the genome

Page 70: We have talked about chromosome organization, what about genome organization?

Mobile DNA

• Moves within genomes• Most of the moderately repeated DNA

sequences found throughout higher eukaryotic genomes– L1 LINE is ~5% of human DNA (~50,000

copies)– Alu is ~5% of human DNA (>500,000 copies)

• Some encode enzymes that catalyze movement

Page 71: We have talked about chromosome organization, what about genome organization?

Transposon derived repeats

• Long interspersed elements – LINEs

• Short interspersed elements - SINEs

• LTR (long terminal repeat) retrotransposons

• DNA transposons

• 45% or more of genome

Page 72: We have talked about chromosome organization, what about genome organization?
Page 73: We have talked about chromosome organization, what about genome organization?

RNA or DNA intermediate

• Transposon moves using DNA intermediate

• Retrotransposon moves using RNA intermediate

Page 74: We have talked about chromosome organization, what about genome organization?

LINEs

• LINE1 – active

• Line2 – inactive

• Line 3 – inactive

• Many truncated inactive sequences

Page 75: We have talked about chromosome organization, what about genome organization?
Page 76: We have talked about chromosome organization, what about genome organization?

Exception – Alu elements

• Derived from signal recognition particle 7SL• Does not share its 3’ end with a LINE• Only active SINE in the human genome

Page 77: We have talked about chromosome organization, what about genome organization?

LTR (long terminal repeat)

• Flank viral retrotransposons and retroviruses

• Repeats contain genes necessary for movement and replication

• Retroviruses have acquired a CP gene

• Many fossils

Page 78: We have talked about chromosome organization, what about genome organization?
Page 79: We have talked about chromosome organization, what about genome organization?

DNA transposons

• Terminal inverted repeats

• Transposase

• 7 major classes

• Transposition doesn’t occur in humans anymore

• Horizontal transfer

Page 80: We have talked about chromosome organization, what about genome organization?

• Different regions of the genome differ in density of repeats

• Most LINEs accumulate in AT rich regions

• Alu elements accumulate in GC rich regions – why? Promote protein translation under stress?

Page 81: We have talked about chromosome organization, what about genome organization?

Simple sequence repeats

• Tamdem repeats of a particular k-mer

• 1 – 13 base repeat unit – microsatellite– Trinucleotide repeats

• 14 – 500 repeats – minisatellites– “variable numbers of tandem repeats”

• 3% of genome

• Used in mapping

Page 82: We have talked about chromosome organization, what about genome organization?
Page 83: We have talked about chromosome organization, what about genome organization?