structure and behavior of gene and chromosome and genome gene size: average 10-15 kb, but enormous...
TRANSCRIPT
SIBC 511SIBC 511: STRUCTURE: STRUCTURE AND BEHAVIOR AND BEHAVIOR OF GENE AND CHROMOSOMEOF GENE AND CHROMOSOME
Chatchawan Srisawat M.D., Ph.D.
nucleotide
• deoxyribose
• phosphate group
• nitrogenous bases
adenine Aguanine Gcytosine Cthymine T
• polydeoxyribonucleotide
DNA STRUCTURE
• almost always found in a double-stranded form (via hydrogen bonds between bases).
• Complementary base pairing: A - T , G - C
A T
G C
5’
5’ 3’
3’
DNA STRUCTURE
• Antiparallel strands of DNA
DNA STRUCTURE
• the B-conformation- the typical Watson-Crick double helix (physiological form).
• the Z-conformation is formed as a result of a certain base-order; left handed helix.
• the A-conformation occurs when DNA is dehydrated
Various conformations of DNA
GENE
GENE: a specific sequence of nucleotides in DNA or RNA that controlsthe transmission and expression of one or more traits by specifying thestructure of a protein or RNA
GENE = THE BASIC UNIT OF HEREDITY
Prokaryotic gene
Eukaryotic gene
GENE
GENE: a specific sequence of nucleotides in DNA or RNA that controlsthe transmission and expression of one or more traits by specifying thestructure of a protein or RNA
Coding region encodes the amino acid sequence of a polypeptide.
GENE EXPRESSION
Typical human genes include:• Regulatory sequences - promoter, enhancer, silencer• Coding regions (coded for protein)- Exons• Non-coding regions (interspersed between exons)- Introns, 5’ and 3’ untranslated regions (UTR)
GENE EXPRESSION
Question
Is the sequence of all exons in mRNA coded for a polypeptide ?
Exon: a segment of a gene that is represented in the mature RNA product. Individualexons may contain coding DNA/or non-coding DNA (untranslated sequences)
• The term genome refers to the complete complement of DNA for a given species.
GENE AND GENOME
Organism Genome Size Estimated
Bases Genes
Human (Homo sapiens) 3 billion 25,000
Laboratory mouse (M. musculus) 2.6 billion 30,000
Mustard weed (A. thaliana) 100 million 25,000
Roundworm (C. elegans) 97 million 19,000
Fruit fly (D. melanogaster) 137 million 13,000
Yeast (S. cerevisiae) 12.1 million 6,000
Bacterium (E. coli) 4.6 million 3,200
Human immunodeficiency virus (HIV) 9700 9
GENE AND GENOME
• closed circular double stranded DNA consisting of 16,569 bp
• encodes 37 genes: 2 rRNAs, 22 tRNAs, 13 protein subunits in respiratory chain complexes (I, III, IV, V)
~20000-25000
GENE AND GENOME
Coding DNA
- represents only ~ 3% of the genome
- encodes the amino acid sequence of a polypeptide, or a functional mature RNA*
* some gene products are RNA (estimated 3000-4000 genes out of 25000 total genes).
GENE AND GENOME
Gene number:
General facts about human genomeGeneral facts about human genome
37 genes (mitochondrial genome)
~ 20,000 – 25,000 genes (nuclear genome)
Gene density:
One gene per 0.45 kb (mitochondrial genome)
One gene per 40-45 kb (nuclear genome)
averages of about
GENE AND GENOME
Average 10-15 kb, but enormous variationGene size:
General facts about human genomeGeneral facts about human genome
genesize(kb)
numberof exons
Average exonsize (bp)
Average intronsize (bp)
Histone H4tRNA
0.40.1
12
30050
-20
insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000
GENE AND GENOME
genesize(kb)
numberof exons
Average exonsize (bp)
Average intronsize (bp)
Histone H4tRNA
0.40.1
12
30050
-20
insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000
General facts about human genomeGeneral facts about human genome
Exon number: Generally correlated with gene size (but showswide variation)
The human genome contains about 12% single exonic genes (Sakharkar et al. 2004).
GENE AND GENOME
genesize(kb)
numberof exons
Average exonsize (bp)
Average intronsize (bp)
Histone H4tRNA
0.40.1
12
30050
-20
insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000
General facts about human genomeGeneral facts about human genome
Exon size: On average, 200 bp (comparatively little length variation)
Intron size: Enormous variation (strong correlation with gene size)
GENE AND GENOME
genesize(kb)
numberof exons
Average exonsize (bp)
Average intronsize (bp)
Histone H4tRNA
0.40.1
12
30050
-20
insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000
General facts about human genomeGeneral facts about human genome
GENE AND GENOME
General facts about human genomeGeneral facts about human genome
Gene orientation:
5’5’3’3’
• head-to-tail
• head-to-head or tail-to-tail
5’5’3’3’
5’5’3’3’
• overlap
5’5’3’3’
GENE AND GENOME
- Some human genes can be found within other genes.e.g. most small nucleolar RNA (snoRNA) genes are located withinribosome-associated proteins or nucleolar proteins.
General facts about human genomeGeneral facts about human genome
• nested gene
Gene orientation:
- About 6% of human genes reside in introns of other genes.
GENE AND GENOME
Pseudogene: a DNA sequence which shows a high degree of sequencehomology to a nonalleic functional gene but which is itself nonfunctional.
GENE AND GENOME
• nonprocessed pseudogene: a gene that has been inactivated (non-functional) because its nucleotide sequence has been changed by mutation.
GENE AND GENOME
Gene fragments: likely to haveoriginated from unequal crossoveror sister chromatid exchange
GENE AND GENOME
Tandemly repeats TTAGGG TTAGGG TTAGGG TTAGGG
Interspersed repeats TACTCTACG
TACTCTACG
GENE AND GENOME
Tandemly repeat noncoding DNATandemly repeat noncoding DNA
1. Satellite DNAs Blocks often from 100000 bp to several Mb in lengthmajor chromosomal location: centromeresfunction: not clear, might be important for centromere functionsize of repeats: 5 to 171 bp
2. Minisatellite DNAs Blocks often within 100 - 20000 bp rangemajor chromosomal location: at or close to telomeresfunction: recombination hot spot?size of repeats: 6 to 64 bp
2.1 telomeric family2.2 hypervariable family- number of repeats increases or decreases between generations (highly polymorphic) --> used as markers in DNA fingerprint application
3 classes
GENE AND GENOME
Tandemly repeat noncoding DNATandemly repeat noncoding DNA
3. Microsatellite DNAs Blocks often less than 150 bpmajor chromosomal location: dispersed throughout all chromosomesfunction: not well understoodsize of repeats: 1 to 4 bp
CA 0.5 % of nuclear genomeCT 0.2 % of nuclear genome
Runs of A or T 0.3 % of nuclear genome
Tri- or tetranucleotride repeats - rare
3 classes
Interspersed repetitive noncoding DNAInterspersed repetitive noncoding DNA
GENE AND GENOME
Class family size number of copies % of genome
SINE alu MIR
~0.3kb ~0.13kb
~1,000,000 ~400,000
~7% ~1.7%
LINE LINE-1 (Kpn) 6.1kb (but most are truncated)
~300,000 ~5%
Others various ~0.4kb ~800,000 ~10%
SINE = Short interspersed element LINE = Long interspersed element
• Alu repeats are very common (once every 3 kb).
• The function of Alu is unknown (speculated to promote unequalrecombination, which may be evolutionarily advantageous in promoting geneduplication??).
GENE AND GENOME
Genes and gene-related sequences 1,200 Mb
Extragenic DNA2,000 Mb
Non-coding DNA1200 Mb
Interspersed Repeats1,400 Mb
Otherintergenicregions
600 Mb
Nuclear genome (~3,000 Mb)
Coding DNA
48 Mb
LINEs640 Mb
LTR250 Mb
SINEs420 Mb
DNA transposons90 Mb
Others510 Mb
Microsatellites90 Mb
PseudogenesGene
fragments Introns, UTRs
GENE AND GENOME
• 3% of the human genome are actually coded for proteins
• A lot of the genome is “junk” – why so much?
• Pelagibacter ubique, one of the smallest self-replicating cells known(almost no junk DNA in its genome)
CHROMOSOME STRUCTURE
From gene to chromosome
• The human genome contains 3 x 109 bp. If the DNA of all 46 chromosomes from one cell was linked together, it would measure one meter in length.
• However, in human as well as other eukaryotes, genomic DNA can be highly folded, constrained, and compacted by histoneand non-histone proteins into chromatin and chromosome.
CHROMOSOME STRUCTURE
• Eukaryotic DNA is associated withhistone proteins.
histone
DNA
• Histones are small (102 to 135 amino acids) proteins that contain a very high proportion of positively charged amino acids such as lysine and arginine. Thus, they have high affinity for DNA (negatively charged molecules).
Level 1: Nucleosome - the most fundamental unit of packaging
Level 1: Nucleosome - the most fundamental unit of packaging
• Nucleosome core particle is consisted of a histone core octamer (two subunits of H2A, H2B, H3, and H4) and 146 bp of DNA wrapped 1.75 turns around the core.
CHROMOSOME STRUCTURE
Nucleosome (200 bp):Nucleosome core particle (146 bp) + linker DNA
Level 2: 30-nm chromatin fiber
• Histone H1 brings nucleosomes together
• DNA is 40-fold more compact
CHROMOSOME STRUCTURE
Level 3: Radial loop scaffold
• Scaffold proteins loop the 30-nm fiber
• Specific, repeated DNA sequencesinteract with the scaffold proteins
CHROMOSOME STRUCTURE
Level 4: Radial loop scaffold
• Additional looping and gathering of loops
• 10000-fold more compact at metaphase
CHROMOSOME STRUCTURE
Roles of chromatin structure on cellular functionsRoles of chromatin structure on cellular functions
• packing long DNA into compact chromosomes during cell division.
• controlling of gene expression by altering chromatin structures
- Packaging of DNA into chromatin and chromosome efficiently compact it in the nucleus (~10000-fold more compact).
CHROMOSOME STRUCTURE
• The packaging of DNA into nucleosomes is generally regarded as a block to transcription, presumably because the nucleosome interferes with binding of activators.
• Affinity of transcription factor for its binding site on DNA is decreased when the DNA is reconstituted into nucleosomes.
transcription factor
transcriptional element
transcription transcription
Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures
CHROMOSOME STRUCTURE
nucleus
Heterochromatin = a portion of the chromatin in the interphase which remains relatively compacted and is transcriptionally inactive. Probably consists of closely packed region of 30-nm chromatin fiber.
CHROMOSOME STRUCTURE
Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures
example Condensation of X chromosome in cells derived from females
Barr body
drumstick
CHROMOSOME STRUCTURE
Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures
Euchromatin = the more diffuse region of the interphasechromosome consisting of less-densed chromatin.
nucleus
CHROMOSOME STRUCTURE
Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures
• Modification of histones(acetylation, methylation) can change the chromatin structure, and hence, the level of gene expression.
CHROMOSOME STRUCTURE
Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures
• Centromere is required to attach to spindle at mitosis, so chromosomes segregate into new cells
• Telomeres protect the ends of chromosomes
• Replication origins are where DNA replication starts
CHROMOSOME STRUCTURE
Important features of chromosome
FEATURES OF CHROMOSOME
CENTROMERECENTROMERE
• Hold sister chromatid together
• Bind spindle fiber, allowingsegregation
• In mammals, it consists ofblocks of satellite DNA.
• Tightly condensed chromatinstructure (heterochromatin)
FEATURES OF CHROMOSOME
TELOMERETELOMERE
• protect the ends of chromosomes from degradationand loss of DNA sequence
• consists of 10-15 kb TTAGGG sequence (telomericfamily of minisatellite DNAs)
Gap
Shortening of the DNA ends
(telomeres) with each replication
Shortening of the DNA ends
(telomeres) with each replication
5’ 3’3’ 5’
5’ 3’3’ 5’
3’5’
5’3’
5’3’
3’5’
5’3’
3’5’
5’ 3’3’ 5’
primerprimerprimer
FEATURES OF CHROMOSOME
Excessive shortening of telomeres may involve genes and disrupt the coding regions.
Aging and cell death
somatic cells: e.g. skin cells (keratinocytes), fibroblasts, etc
FEATURES OF CHROMOSOME
In germ line, telomerase is expressed to maintain the length of telomere.
Abnormal expression may be found in neoplastic cells.Abnormal expression may be found in neoplastic cells.
FEATURES OF CHROMOSOME
FEATURES OF CHROMOSOME
REPLICATION ORIGINREPLICATION ORIGIN
• Sequences recognized by initiator protein
• Mulitple origins needed to replicate chromosome efficiently
• Sites of DNA replication initiation
Yeast’s autonomously replicating sequence (ARS)
HUMAN ARTIFICIAL CHROMOSOME
An ideal vehicle for gene delivery
• Large insert capacity
• Predictable gene expression (endogenous machinery)
• Stable inheritance without integration
• Non-immunogenic