structure and behavior of gene and chromosome and genome gene size: average 10-15 kb, but enormous...

58
SIBC 511 SIBC 511 : STRUCTURE : STRUCTURE AND BEHAVIOR AND BEHAVIOR OF GENE AND CHROMOSOME OF GENE AND CHROMOSOME Chatchawan Srisawat M.D., Ph.D.

Upload: ngodien

Post on 16-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

SIBC 511SIBC 511: STRUCTURE: STRUCTURE AND BEHAVIOR AND BEHAVIOR OF GENE AND CHROMOSOMEOF GENE AND CHROMOSOME

Chatchawan Srisawat M.D., Ph.D.

nucleotide

• deoxyribose

• phosphate group

• nitrogenous bases

adenine Aguanine Gcytosine Cthymine T

• polydeoxyribonucleotide

DNA STRUCTURE

• almost always found in a double-stranded form (via hydrogen bonds between bases).

• Complementary base pairing: A - T , G - C

A T

G C

5’

5’ 3’

3’

DNA STRUCTURE

• Antiparallel strands of DNA

DNA STRUCTURE

• the B-conformation- the typical Watson-Crick double helix (physiological form).

• the Z-conformation is formed as a result of a certain base-order; left handed helix.

• the A-conformation occurs when DNA is dehydrated

Various conformations of DNA

• Major and minor grooves are important for DNA-protein interactions

DNA STRUCTURE

Rasmol

DNA STRUCTURE

• Major and minor grooves are important for DNA-protein interactions.

DNA STRUCTURE

GENE

GENE: a specific sequence of nucleotides in DNA or RNA that controlsthe transmission and expression of one or more traits by specifying thestructure of a protein or RNA

GENE = THE BASIC UNIT OF HEREDITY

Prokaryotic gene

Eukaryotic gene

GENE

GENE: a specific sequence of nucleotides in DNA or RNA that controlsthe transmission and expression of one or more traits by specifying thestructure of a protein or RNA

Coding region encodes the amino acid sequence of a polypeptide.

GENE EXPRESSION

Typical human genes include:• Regulatory sequences - promoter, enhancer, silencer• Coding regions (coded for protein)- Exons• Non-coding regions (interspersed between exons)- Introns, 5’ and 3’ untranslated regions (UTR)

GENE EXPRESSION

Question

Is the sequence of all exons in mRNA coded for a polypeptide ?

Exon: a segment of a gene that is represented in the mature RNA product. Individualexons may contain coding DNA/or non-coding DNA (untranslated sequences)

• The term genome refers to the complete complement of DNA for a given species.

GENE AND GENOME

Organism Genome Size Estimated

Bases Genes

Human (Homo sapiens) 3 billion 25,000

Laboratory mouse (M. musculus) 2.6 billion 30,000

Mustard weed (A. thaliana) 100 million 25,000

Roundworm (C. elegans) 97 million 19,000

Fruit fly (D. melanogaster) 137 million 13,000

Yeast (S. cerevisiae) 12.1 million 6,000

Bacterium (E. coli) 4.6 million 3,200

Human immunodeficiency virus (HIV) 9700 9

GENE AND GENOME

• closed circular double stranded DNA consisting of 16,569 bp

• encodes 37 genes: 2 rRNAs, 22 tRNAs, 13 protein subunits in respiratory chain complexes (I, III, IV, V)

~20000-25000

GENE AND GENOME

Coding DNA

- represents only ~ 3% of the genome

- encodes the amino acid sequence of a polypeptide, or a functional mature RNA*

* some gene products are RNA (estimated 3000-4000 genes out of 25000 total genes).

GENE AND GENOME

Gene number:

General facts about human genomeGeneral facts about human genome

37 genes (mitochondrial genome)

~ 20,000 – 25,000 genes (nuclear genome)

Gene density:

One gene per 0.45 kb (mitochondrial genome)

One gene per 40-45 kb (nuclear genome)

averages of about

GENE AND GENOME

Average 10-15 kb, but enormous variationGene size:

General facts about human genomeGeneral facts about human genome

genesize(kb)

numberof exons

Average exonsize (bp)

Average intronsize (bp)

Histone H4tRNA

0.40.1

12

30050

-20

insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000

GENE AND GENOME

genesize(kb)

numberof exons

Average exonsize (bp)

Average intronsize (bp)

Histone H4tRNA

0.40.1

12

30050

-20

insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000

General facts about human genomeGeneral facts about human genome

Exon number: Generally correlated with gene size (but showswide variation)

The human genome contains about 12% single exonic genes (Sakharkar et al. 2004).

GENE AND GENOME

genesize(kb)

numberof exons

Average exonsize (bp)

Average intronsize (bp)

Histone H4tRNA

0.40.1

12

30050

-20

insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000

General facts about human genomeGeneral facts about human genome

Exon size: On average, 200 bp (comparatively little length variation)

Intron size: Enormous variation (strong correlation with gene size)

GENE AND GENOME

genesize(kb)

numberof exons

Average exonsize (bp)

Average intronsize (bp)

Histone H4tRNA

0.40.1

12

30050

-20

insulin 1.4 3 155 480β-globin 1.6 3 150 490Class I HLA 3.5 8 187 260serum albumin 18 14 137 1,100type VII collagen 31 118 77 90complement C3 41 29 122 900factor VIII 186 26 375 7,100CFTR 250 27 227 9,100Dystrophin 2400 79 180 30,000

General facts about human genomeGeneral facts about human genome

GENE AND GENOME

General facts about human genomeGeneral facts about human genome

Gene orientation:

5’5’3’3’

• head-to-tail

• head-to-head or tail-to-tail

5’5’3’3’

5’5’3’3’

• overlap

5’5’3’3’

GENE AND GENOME

- Some human genes can be found within other genes.e.g. most small nucleolar RNA (snoRNA) genes are located withinribosome-associated proteins or nucleolar proteins.

General facts about human genomeGeneral facts about human genome

• nested gene

Gene orientation:

- About 6% of human genes reside in introns of other genes.

GENE AND GENOME

Pseudogene: a DNA sequence which shows a high degree of sequencehomology to a nonalleic functional gene but which is itself nonfunctional.

GENE AND GENOME

• nonprocessed pseudogene: a gene that has been inactivated (non-functional) because its nucleotide sequence has been changed by mutation.

GENE AND GENOME

• processed pseudogene: non-functional due to lack of intronsand control region

GENE AND GENOME

GENE AND GENOME

Gene fragments: likely to haveoriginated from unequal crossoveror sister chromatid exchange

GENE AND GENOME

Tandemly repeats TTAGGG TTAGGG TTAGGG TTAGGG

Interspersed repeats TACTCTACG

TACTCTACG

GENE AND GENOME

Tandemly repeat noncoding DNATandemly repeat noncoding DNA

1. Satellite DNAs Blocks often from 100000 bp to several Mb in lengthmajor chromosomal location: centromeresfunction: not clear, might be important for centromere functionsize of repeats: 5 to 171 bp

2. Minisatellite DNAs Blocks often within 100 - 20000 bp rangemajor chromosomal location: at or close to telomeresfunction: recombination hot spot?size of repeats: 6 to 64 bp

2.1 telomeric family2.2 hypervariable family- number of repeats increases or decreases between generations (highly polymorphic) --> used as markers in DNA fingerprint application

3 classes

GENE AND GENOME

Tandemly repeat noncoding DNATandemly repeat noncoding DNA

3. Microsatellite DNAs Blocks often less than 150 bpmajor chromosomal location: dispersed throughout all chromosomesfunction: not well understoodsize of repeats: 1 to 4 bp

CA 0.5 % of nuclear genomeCT 0.2 % of nuclear genome

Runs of A or T 0.3 % of nuclear genome

Tri- or tetranucleotride repeats - rare

3 classes

GENE AND GENOME

Chromosomal location of major repetitive DNA classes

GENE AND GENOME

• Satellite• Minisattelite• Microsatelite

Interspersed repetitive noncoding DNAInterspersed repetitive noncoding DNA

GENE AND GENOME

Class family size number of copies % of genome

SINE alu MIR

~0.3kb ~0.13kb

~1,000,000 ~400,000

~7% ~1.7%

LINE LINE-1 (Kpn) 6.1kb (but most are truncated)

~300,000 ~5%

Others various ~0.4kb ~800,000 ~10%

SINE = Short interspersed element LINE = Long interspersed element

• Alu repeats are very common (once every 3 kb).

• The function of Alu is unknown (speculated to promote unequalrecombination, which may be evolutionarily advantageous in promoting geneduplication??).

GENE AND GENOME

Location of repetitive DNAs in human retinoblastoma susceptibility gene

GENE AND GENOME

• Satellite• Minisattelite• Microsatelite

• LINE• SINE• Transposon

GENE AND GENOME

Genes and gene-related sequences 1,200 Mb

Extragenic DNA2,000 Mb

Non-coding DNA1200 Mb

Interspersed Repeats1,400 Mb

Otherintergenicregions

600 Mb

Nuclear genome (~3,000 Mb)

Coding DNA

48 Mb

LINEs640 Mb

LTR250 Mb

SINEs420 Mb

DNA transposons90 Mb

Others510 Mb

Microsatellites90 Mb

PseudogenesGene

fragments Introns, UTRs

GENE AND GENOME

• 3% of the human genome are actually coded for proteins

• A lot of the genome is “junk” – why so much?

• Pelagibacter ubique, one of the smallest self-replicating cells known(almost no junk DNA in its genome)

CHROMOSOME STRUCTURE

From gene to chromosome

• The human genome contains 3 x 109 bp. If the DNA of all 46 chromosomes from one cell was linked together, it would measure one meter in length.

• However, in human as well as other eukaryotes, genomic DNA can be highly folded, constrained, and compacted by histoneand non-histone proteins into chromatin and chromosome.

CHROMOSOME STRUCTURE

• Eukaryotic DNA is associated withhistone proteins.

histone

DNA

• Histones are small (102 to 135 amino acids) proteins that contain a very high proportion of positively charged amino acids such as lysine and arginine. Thus, they have high affinity for DNA (negatively charged molecules).

Level 1: Nucleosome - the most fundamental unit of packaging

Level 1: Nucleosome - the most fundamental unit of packaging

• Nucleosome core particle is consisted of a histone core octamer (two subunits of H2A, H2B, H3, and H4) and 146 bp of DNA wrapped 1.75 turns around the core.

CHROMOSOME STRUCTURE

Nucleosome (200 bp):Nucleosome core particle (146 bp) + linker DNA

Level 2: 30-nm chromatin fiber

• Histone H1 brings nucleosomes together

• DNA is 40-fold more compact

CHROMOSOME STRUCTURE

10-nm fiber 30-nm fiber

Level 2: 30-nm chromatin fiber

CHROMOSOME STRUCTURE

Level 3: Radial loop scaffold

• Scaffold proteins loop the 30-nm fiber

• Specific, repeated DNA sequencesinteract with the scaffold proteins

CHROMOSOME STRUCTURE

Level 4: Radial loop scaffold

• Additional looping and gathering of loops

• 10000-fold more compact at metaphase

CHROMOSOME STRUCTURE

Roles of chromatin structure on cellular functionsRoles of chromatin structure on cellular functions

• packing long DNA into compact chromosomes during cell division.

• controlling of gene expression by altering chromatin structures

- Packaging of DNA into chromatin and chromosome efficiently compact it in the nucleus (~10000-fold more compact).

CHROMOSOME STRUCTURE

• The packaging of DNA into nucleosomes is generally regarded as a block to transcription, presumably because the nucleosome interferes with binding of activators.

• Affinity of transcription factor for its binding site on DNA is decreased when the DNA is reconstituted into nucleosomes.

transcription factor

transcriptional element

transcription transcription

Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures

CHROMOSOME STRUCTURE

nucleus

Heterochromatin = a portion of the chromatin in the interphase which remains relatively compacted and is transcriptionally inactive. Probably consists of closely packed region of 30-nm chromatin fiber.

CHROMOSOME STRUCTURE

Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures

example Condensation of X chromosome in cells derived from females

Barr body

drumstick

CHROMOSOME STRUCTURE

Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures

Euchromatin = the more diffuse region of the interphasechromosome consisting of less-densed chromatin.

nucleus

CHROMOSOME STRUCTURE

Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures

• Modification of histones(acetylation, methylation) can change the chromatin structure, and hence, the level of gene expression.

CHROMOSOME STRUCTURE

Controlling of gene expression by altering chromatin structuresControlling of gene expression by altering chromatin structures

• Centromere is required to attach to spindle at mitosis, so chromosomes segregate into new cells

• Telomeres protect the ends of chromosomes

• Replication origins are where DNA replication starts

CHROMOSOME STRUCTURE

Important features of chromosome

FEATURES OF CHROMOSOME

CENTROMERECENTROMERE

• Hold sister chromatid together

• Bind spindle fiber, allowingsegregation

• In mammals, it consists ofblocks of satellite DNA.

• Tightly condensed chromatinstructure (heterochromatin)

FEATURES OF CHROMOSOME

TELOMERETELOMERE

• protect the ends of chromosomes from degradationand loss of DNA sequence

• consists of 10-15 kb TTAGGG sequence (telomericfamily of minisatellite DNAs)

Gap

Shortening of the DNA ends

(telomeres) with each replication

Shortening of the DNA ends

(telomeres) with each replication

5’ 3’3’ 5’

5’ 3’3’ 5’

3’5’

5’3’

5’3’

3’5’

5’3’

3’5’

5’ 3’3’ 5’

primerprimerprimer

FEATURES OF CHROMOSOME

Excessive shortening of telomeres may involve genes and disrupt the coding regions.

Aging and cell death

somatic cells: e.g. skin cells (keratinocytes), fibroblasts, etc

FEATURES OF CHROMOSOME

In germ line, telomerase is expressed to maintain the length of telomere.

Abnormal expression may be found in neoplastic cells.Abnormal expression may be found in neoplastic cells.

FEATURES OF CHROMOSOME

FEATURES OF CHROMOSOME

REPLICATION ORIGINREPLICATION ORIGIN

• Sequences recognized by initiator protein

• Mulitple origins needed to replicate chromosome efficiently

• Sites of DNA replication initiation

Yeast’s autonomously replicating sequence (ARS)

HUMAN ARTIFICIAL CHROMOSOME

An ideal vehicle for gene delivery

• Large insert capacity

• Predictable gene expression (endogenous machinery)

• Stable inheritance without integration

• Non-immunogenic