chapter 5 organization of human genome · 2012. 4. 13. · chapter 5 organization of human genome....
TRANSCRIPT
Chapter 5 Chapter 5
Organization of Human Organization of Human GenomeGenome
Outline of this chapter
O i ti f hOrganization of human genome G t tGene structure
Human genome:Human genome:total genetic information in human cellstotal genetic information in human cells
Nuclear Genome
Mitochondrial Genome
Cell
HistonesDNADNA
Human Genomet t l ti i f ti i h ll
Human Genometotal genetic information in human cells
Nuclear Genome:total DNA in nucleus
Mitochondrial Genome:total DNA in mitochondria
Nuclear Genome3 x 10 9 base pairsDistributed between 24 different types ofDistributed between 24 different types of
linear double-stranded DNA molecules 130 Mb on average, but varying between 50
and 260 Mband 260 Mb
DNA t t f h h
5.1 5.1 Organization of human genomeOrganization of human genome
Chromosome Amount of DNA (Mb) Chromosome Amount of DNA (Mb)
DNA content of human chromosomes
1 279(30) 13 118(16)2 251(3) 14 107(16)2 251(3) 14 107(16)3 221(3) 15 100(17)4 197(3) 16 104(15)( ) ( )5 198(3) 17 88(3)6 176(3) 18 86(3)( ) ( )7 163(3) 19 72(3)8 148(3) 20 66(3)9 140(22) 21 45(11)10 143(3) 22 48(13)11 148(3) X 163(3)12 142(3) Y 51(27)
各种生物基因组大小比较(从原核生物到哺乳动物)。
不同物种基因组大小不同物种基因组大小种类 Mb种类 Mb
大肠杆菌 4.64啤酒酵母 12 1啤酒酵母 12.1线 虫 100果 蝇 140果 蝇 140蝗 虫 5000小 鼠 3300豌 豆 4800玉 米 5000小 麦 17000小 麦
人 3000
C值悖论
生物体单倍体DNA总量称为 C值。
高等生物具有比低等生物更复杂的生命活动,所
以,理论上应该是它们的C值也应该更高。但是事以,理论上应该是它们的C值也应该更高。但是事实上C值没有体现出与物种进化程度相关的趋势。高等生物的C值不一定就意味着它的C值高于比它高等生物的C值不一定就意味着它的C值高于比它低等的生物。这种生物学上的DNA总量的比较和矛盾,称为C值悖论(C value paradox)。矛盾,称为C值悖论(C value paradox)。
表现在两个方面:
与预期的编码蛋白质基因的数量相比,基因组DNA与预期的编码蛋白质基因的数量相比,基因组DNA含量过多。
一些物种之间的复杂性变化范围并不大,但C值有一些物种之间的复杂性变化范围并不大,但C值有很大的变化范围。
单倍体基因组DNA 含量在低等真核生物中与形态复杂性有一定的正相关,但在高等真核生物中却非如此,它们的单倍体基因组DNA含量变化不定。
5.1 5.1 Organization of human genomeOrganization of human genome
根据真核生物的复性动力学,其DNA序列可分为:
Unique sequenceModerately repetitive sequencesModerately repetitive sequences Highly repetitive sequences g y p q
一些多倍体植物中没有非重复DNA, 复性最慢的也有3个一些多倍体植物中没有非重复DNA, 复性最慢的也有3个多拷贝。
而在螃蟹的基因组中,没有中等重复序列,只有单一序列和高度重复序列。
在低等真核生物中,没有高度重复序列
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
Denaturation and Renaturation
Nuclear Genome
Denaturation and Renaturation
ATGAGCTGTACGATCGTG
Denatured DNA
TACTCGACATGCTAGCACATGAGCTGTACGATCGTG
TACTCGACATGCTAGCACATGAGCTGTACGATCGTG
G GC G CG CG G
TACTCGACATGCTAGCAC
Double stranded DNA
TACTCGACATGCTAGCAC
Double stranded DNATACTCGACATGCTAGCAC
Single stranded DNA
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
Nuclear Genome
denature
renaturerenature
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
unique sequencesNuclear Genome
unique sequences
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
unique sequencesNuclear Genome
unique sequences
the most common, 60% of human genome
including most of the protein-coding genes
copy numbers : single or several copiescopy numbers : single or several copies
mRNA 的复性动力学曲线表明,大多数mRNA来自非重复DNA,其余来自中度重复DNA,无mRNA来自高度重复DNA。
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
Repetitive sequencesNuclear Genome
Repetitive sequences
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
Moderately Repetitive sequences
Nuclear Genome
Moderately Repetitive sequences
30% out of the human genome
copies: 102-105copies: 10 -10
contain rRNA genes, tRNA genes, histone genes,
genes of heavy strand and light strand of
immunoglobulin, and so on.immunoglobulin, and so on.
include SINE and LINE
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
Moderately Repetitive sequences
Nuclear Genome
Moderately Repetitive sequences
Short Interspersed Nuclear Element ,SINE
100 to 400 bp in length
copies: 105
for example : Alu repeat sequence
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
Moderately Repetitive sequences
Nuclear Genome
Alu repeatModerately Repetitive sequences
the most common interspersted
repeat in human genomerepeat in human genome
300bp in length, cut by Alu I into
t f t 170b d 130b
5’ …..AGCT……3’
3’……TCGA……5’two fragments ,170bp and 130bp
found only in primates, while other
3 ……TCGA……5
classes of SINEs are common in
other mammalian species.
N l G m5.1 5.1 Organization of human genomeOrganization of human genome
Moderately Repetitive sequences
Nuclear Genome
Long Interspersed Nuclear Element LINE
Moderately Repetitive sequences
Long Interspersed Nuclear Element, LINE
about 6,000 bp (5,000~7000bp) in length
copies: 102-104
e.g. Kpn I families 6 5kb in length cut by Kpn I into four fragments6.5kb in length, cut by Kpn I into four fragments, 1.9kb, 1.8kb, 1.2kb and 1.5kb ,respectively)
N l G m
Highly Repetitive sequences
Nuclear Genome
Highly Repetitive sequences
10% of human genome
length of repeat unit: <200bp
copies: 106-108copies: 10 -10
types: satellite DNA and reverted repeat sequence
N l G m
Highly Repetitive sequences
Nuclear Genome
Highly Repetitive sequences
Satellite DNA: constitute the centromere, telomereof the human chromosome, and constitutive heterochromatin region on some chromosomes. Its function is unknown now.function is unknown now.
minisatellite DNA:
6bp~25bp of repeat unit in lengthmicrosatllite DNA:microsatllite DNA:
2bp~5bp of repeat unit in length
Highly Repetitive sequences
Satellite DNA is foundSatellite DNA is found in the constitutiveHeterochromatinHeterochromatin region
用CsCl密度梯度离心通常将小鼠DNA分成一条主带和一条卫星带。
N l G mReverted Repeat Sequence
Nuclear Genome
5' TAATCCCACAGCCGCCAGTTCCGCTGGCGGCATTT 3'3' ATTAGGGTGTCGGCGGTCAAGGCGACCGCCGTAAA5'
N l G mNuclear Genome
Multigene families
A group of functionally related genes formed by duplication and variation of an ancestral gene, repeated on the same or different chromosomesclustered gene family (gene cluster)g y (g )Interspersed gene family
Clustered gene familiesGrowth hormone 5 copies (67kb)Growth hormone 5 copies (67kb)αglobin 7 copies (50kb)Ho genes (m lti) 38 fo r cl stersHox genes (multi) 38 four clustersOlfactory receptors 1000 in 25
large clusters
Interspersed gene familiesPax 9 copiesPax 9 copiesActin >20 copiesAlu elements (repeats) 1.1 millionLINE elements (L1) 200-500,000( ) ,
所有珠蛋白基因皆是从同一个祖代基因所有珠蛋白基因皆是从同一个祖代基因通过不断重复、转座和突变而来的
Formation of higher order repeat unitsg p
N l G m
PseudogenesNuclear Genome
Pseudogenes
• Nonfunctional copies of genes• Nonfunctional copies of genes• Formed by duplication of ancestral gene, or
t i ti ( d i t ti )reverse transcription (and integration)• Not expressed due to mutations that produce a
stop codon (nonsense or frameshift) or prevent mRNA processing, or due to lack of regulatory sequences
Nonprocessed pseudogenesNonprocessed pseudogenes
Processed pseudogenes
假基因常见于多基因家族如ß球蛋白,HLA,免疫球蛋白家族等。疫球蛋白家族等。
单拷贝基因家族也可产生多个假基因,如精氨酸琥珀酸合成酶(ASS)基因有四个假基因。酸琥珀酸合成酶(ASS)基因有四个假基因。
假基因数目一般较少,往往只占基因总数的一小部分,但编码小鼠核糖体的活性 基因与假小部分,但编码小鼠核糖体的活性 基因与假基因的比例高达1:15。
Nuclear genome 3000Mb3000Mb
Gene and gene related sequences
Extragenic sequences
30% 70%
coding DNA 10%
noncoding DNA 90%
Unique80%
repetitive20%10% 90% 80% 20%
pseudogenes Introns,flanking,etc
Mit h d i l G mMitochondrial Genome
Mit h d i l G mMitochondrial Genome
– Small (16.5 kb) circular DNA– rRNA, tRNA and protein encoding genes
(37)– 1 gene/0.45 kb– Very few repeatsVery few repeats– No introns
93% di– 93% coding– No recombination– Maternal inheritance
Mi h d i l GMitochondrial Genome
Mitochondrial genome 16569bp 37genes16569bp,37genes
rRNA genes(2)
tRNA genesPolypeptide
encoding(2) (22)encoding
genes(13)
5.2 5.2 GeneGene StructureStructure
Definition
Gene is a segment of DNAGene is a segment of DNAencoding a functional productg p
-RNA or polypeptidesRNA or polypeptides
Perhaps 30-40,000 genesh hin the human genome.
How can so few genes make human?
How can so many genesHow can so many genes make rice?
Perhaps 50-60,000 genes
in the rice genome.
基因组大小 & 基因数
基因数量 -> 生物复杂性?基因数量 生物复杂性?
• 1. 基因数量的变化,无法解释生物学功能、调控机理1. 基因数量的变化,无法解释生物学功能、调控机理
以及物种多样性和复杂性的巨大变化
• 2. 当前解释:蛋白质组的多样性和复杂性 -> 物种的
多样性和复杂性;‾10 000 000种蛋白质分子多样性和复杂性; 10,000,000种蛋白质分子
• 3. 两种观点:
–a. 转录后层面,mRNA剪切,产生拼接异构体
–b 蛋白质层面 蛋白质序列上一个或多个位点上发生的翻b. 蛋白质层面,蛋白质序列上一个或多个位点上发生的翻
译后修饰
Genotype to Phenotypeyp yp
Post transcription level:mRNASplicing
Post transcription level:mRNASplicing
mRNA Splicing
isoform 1 isoform 2 isoform 3
Post translation level:Protein modification
Sumoylation
Phosphorylationp y
Palmitoylationy
AcetylationUbiquitination
Interation network
Protein-protein pinteraction
Hybridization of mRNA and DNA
Hybridization of mRNA and DNADNADNA
Eukaryotic genes are split genesy g g极少数基因除外,如Thrombomodulin,THBD基因
A “Simple” Eukaryotic Gene
Introns
Exons
Flanking sequencesFlanking sequences
真核生物基因总体大小差别迥异,特别是酵母和高等真核生物存在显著差别。
酵母基因平均1.4kb 长,只有少部分基因大于5kb。相反,果蝇和哺乳动物中大因大于5kb。相反,果蝇和哺乳动物中大多数基因长度在5kb 到100kb 之间,只有少部分基因小于2kb。
酵母基因常常很小 但果蝇和哺乳动物基酵母基因常常很小,但果蝇和哺乳动物基因大小散乱分布,差别很大。
不同真核生物,如酵母、昆虫和哺乳动物中,基因的总体组织形式也不尽相同。基因的总体组织形式也不尽相同。
在酿酒酵母中,大部分基因 (>96%)是非割裂基因,而包含外显子的部分通常很紧凑。基因,而包含外显子的部分通常很紧凑。
酵母大多数基因为非割裂基因,但果蝇和哺乳动物的基因绝大多数是割裂的(非断裂基因只有一个外显子)基因绝大多数是割裂的(非断裂基因只有一个外显子)
ExonsIntrons Splicing junction
Flanking sequences- Promoter- Enhancer/silencer- Enhancer/silencer- Terminator
ExonsSegment of a gene which is decoded to give a mature RNA product
Individual exons may contain coding DNAor noncoding DNA (untranslatedor noncoding DNA (untranslated sequences, UTS).
Coding regionCoding regionCoding regionCoding region
Nucleotides (open reading frame) encodingNucleotides (open reading frame) encoding the amino acid sequence of a protein
T l ti Translation Stop Translation Start Site (ATG)
UTS
Translation Stop Site( TAA,TAG,TGA)
Coding sequenceUTSUTS
AATAAA
ExonsT i i T i i Transcription Start Site
Transcription Stop Site
• 人类基因外显子很少超过800bp,少数基因除外,如VIII的外显子长约3106bp,ApoB基因的外显子约7572bp.3106bp,ApoB基因的外显子约7572bp.
编码蛋白的外显子常常很短
IntronsNoncoding DNA which separates neighboring exons in a gene During gene expression introns, likeDuring gene expression introns, like exons, are transcribed into RNA but the transcribed intron sequences aretranscribed intron sequences are subsequently removed by RNA splicing and are not present in mature mRNA.
Transcription Transcription Exons IntronsTranscription Start Site
Transcription Stop Site
UTS UTS
UTS UTS
Transcription
Primary RNA
Processing
RNA
UTS UTSMature RNA
脊椎动物基因中内含子大小差别十分明显。脊椎动物基因中内含子大小差别十分明显。
外显子序列保守,内含子序列多变
因编码蛋白质功能的需要,外显子区域是保守的
内含子比外显子进化快。当不同种间的基因进行比
较,有时其外显子同源,而内含子间变化巨大,甚至不存在任何相关序列。
外显子和内含子中突变率是相同的,但在外显子外显子和内含子中突变率是相同的,但在外显子中逆向选择使突变被更有效地剔除。
可利用保守的外显子分离基因可利用保守的外显子分离基因
鉴定基因的主要方法大都以外显子的保守性和内含子的多变性比较为基础。一个功能在不同种内是保守的基因,其代个功能在不同种内是保守的基因,其代表的蛋白质序列应该有两个性质:具有一个开放读框,并与其他种属有相关的一个开放读框,并与其他种属有相关的序列。这些特点可以用来分离基因。
Zoo blot☺利用保守基因的两个特性检测基因的存在,首先与不同物种的基因组样品进行southern 印记先与不同物种的基因组样品进行southern 印记杂交,产生阳性杂交信号的基因组DNA克隆可能含有在进化上十分保守的编码序列,然后检查含有在进化上十分保守的编码序列,然后检查能够杂交的序列中是否含有可读框。
☺依靠这种特性,可分离纯化我们不太熟悉的但具有某些功能的基因。
利用人Y染色体上的zfy基因做探针与其它动物的性染色体杂交的结果
通过Z Bl tti DNA杂交 基因组杂通过Zoo-Blotting,cDNA杂交,基因组杂交,蛋白质分析,DMD 基因得以定性。
S li j iSplice junction(exon/intron boundary)(exon/intron boundary)
Splice donor site: the junction between the end of an exon and the start of the downstream intron, commencing with the dinucleotide GT. Splice acceptor site: the junction between the end of an intron terminating in the dinucleotideend of an intron terminating in the dinucleotide AG, and the start of the next exon.Branch site: the third conserved intronicBranch site: the third conserved intronic sequence that is known to be functionally important in splicing
S li j iSplice junction(exon/intron boundary)(exon/intron boundary)
Splice junctionSplice junction(exon/intron boundary)(e o / t o bou da y)
ATGAAAAGGAAAGTCCTTACTTTTCTTTTGTTTTGCAAATTAGAAAGCCGACCGAGCAAAGAGATTTGAATTTTTACTGAAGCAGACAGAACTTTTTGCACATTTCATTCAGCCTTCAGCACAGAAATCTCCAACATCTCCACTGAACATGAAATTGGGACGTCCCCGAATAAAGAAAGAACAGAAATCTCCAACATCTCCACTGAACATGAAATTGGGACGTCCCCGAATAAAGAAAGATGAAAAGCAGAGCTTAATTTCTGCTGGAGAGTATGTTGGCACCTCTCTCTCTACTTTCTT TCCTTTCTCCTACCTTTTCTTCCTCCTGTCCTCCCTTGATCCCTTCATGCACCCCTTCGC TCTTCATTGTTCAGTATACCTTCATGTGACAAAAAATATTGCCATTATAATTATGTTTTG AAGACAACTATATTTTTTTCTCACTAGAGGCTGATCAGTAAAAATGTAGGCTGGTTCTAC TGATTTCTAAGCAAGACCTTGGACAACTCATTCTTTTTCTATAAAAGATAATAGCCATGT ACACTGATTTAATTGATACCTTATCATTTAGGTCGAATATGAAGGGATTTCCTTTTTTAA TTTCAGCTACCGCCATAGGCGCACAGAGCAAGAAGAAGATGAAGAGCTACTGTCTGAGAGTTTCAGCTACCGCCATAGGCGCACAGAGCAAGAAGAAGATGAAGAGCTACTGTCTGAGAGTCGGAAAACATCTAATGTGTGTATTAGATTTGAGGTGTCACCTTCATGTAAGTACTTCAT CACATTGGTGAGTTCTTTTTCAATTTAGTTTTAGAAAAATTTTACTTGAGTATGTTAATG AAAGTATGAAATGTCCTTGCATTTTTTCACCAGATGTGAAAGGGGGGCCACTGAGAGATTATCAGATTCGAGGACTGAATTGGTTGATCTCTTTATATGAAAATGGAGTCAATGGCATTTTGGCTGATGAAATGGTAAGGAATTGGTAGCTAAAAACACATTCTCAGTTATCAATGATTT
Splice junctionSplice junction(exon/intron boundary)
Consensus sequences are conserved throughout eukaryotes
(e o / t o bou da y)
throughout eukaryotesConservation of sequence is expected,
since recognition of sequences issince recognition of sequences is accomplished by base pairing with snRNPs RNA componentsnRNPs RNA component
Secondary t t d l fstructure model of
human U1 snRNP. The region where itThe region where it recognizes the pre-mRNA is also shown
Flanking Sequences• 5’ untranscribed region. Signals for initiation
and control of transcriptionand control of transcription- Promoter
• Enhancer / Silencer-Enhancer stimulates transcription-Silencer inhibits transcription
• 3’ untranscribed region Signals for• 3 untranscribed region. Signals for termination of transcription
Regulatory Sequences
Promoter/Proximal ElementsOccur within ~200 bp of the start site.Contain up to ~20 bp.p pCell-type specific
Basal Promoter Analysis
• TATAA(T)AA(T) -30 TBP• GGC(T)CAATCT 75 CTF/NF1• GGC(T)CAATCT -75 CTF/NF1• GGGCGG -90 SP1
+1
TATACAATGC
Promoter-ProximalPromoter Proximal Elements
TATA boxMost commonMost commonHighly transcribed genes25~35 base pairs upstream of start siteInitiatorAt start siteGC b xes (CpG islands)GC boxes (CpG islands)“Housekeeping” genes (transcribed at low rate)Within ~100 base pairs of start site
TATA box~ 25 bp upstream of +1Only promoter element that is relatively fixed in relation to start pointTends to be surrounded by GC-richTends to be surrounded by GC rich sequencesSingle base substitutions in TATASingle base substitutions in TATA strong promoter down mutationsSome promoters do not contain TATASome promoters do not contain TATA
InitiatorInstead of a TATA box, some eukaryotic gene contain an alternative promoter element, called an initiator.Initiator is highly degenerative.g y g
+15’ Y Y A N T/A Y Y Y
Y = pyrimidine (C or T) N = any
C G i l dCpG islandGenes coding for intermediary metabolism are transcribed at low rates, and do not contain a TATA box or initiatoror initiator.Most genes of this type contain a CG-rich stretch of 20-50 nt within ~100 bp upstream of the start site region50 nt within 100 bp upstream of the start site region.A transcription factor called SP1 recognizes these CG-rich region.rich region.Gives multiple alternative mRNA start sites.
mRNA~100 bp
Multiple CpG island
~100 bp
5’-start sitesCpG island
研究真核生物启动子结构和功能的方法
确定启动子的位置和长度:缺失实验确定上游边界,
缺失结合重组实验来确定下游边界。缺失结合重组实验来确定下游边界。
位置确定后,采用点突变来研究每个碱基在启动子中所起的作用。所起的作用。
启动子DNA结合蛋白的方法:
酵母单杂交技术酵母单杂交技术
噬菌体展示技术
DNA迁移率变动实验DNA迁移率变动实验
DNaseI足迹实验
DNA迁移率变动实验(EMSA)Electrophoresis Mobility Shift Assay
一种体外研究DNA与蛋白质相互作用的特殊的凝胶电泳技术.基本原理为: 在凝胶电泳中由于电场的作用,基本原理为: 在凝胶电泳中由于电场的作用,小分子DNA片段比其结合了蛋白质的DNA片段向阳极移动的速度快.因此可标记短的双链DNA阳极移动的速度快.因此可标记短的双链DNA 片段将其与蛋白质混合,对混合物进行凝胶电泳,若目的DNA与特异性蛋白质结合其向阳极移动的速度受到阻滞对凝胶进行放射性自显影就动的速度受到阻滞对凝胶进行放射性自显影就可找到DNA结合蛋白. 由于其特异性好DNA 迁移率变动试验常用来鉴由于其特异性好DNA 迁移率变动试验常用来鉴定其他方法筛选出的结果
EMSA - electrophoretic mobility shift assayEMSA electrophoretic mobility shift assay
P bProbe NE
EMSA - electrophoretic mobility shift assayEMSA electrophoretic mobility shift assay
NENE
A Single Nucleotide Polymorphism in the MDM2 Promoter Attenuates the p53 Tumor Suppressor Pathway and Accelerates Tumor Formation in Humans Cell VolTumor Suppressor Pathway and Accelerates Tumor Formation in Humans. Cell, Vol. 119, 591–602, November 24, 2004,
E hEnhancersCan be located several kb from promoterCan be present in either orientation prelative to the promoterContain elements that bind inducibleContain elements that bind inducible factorsUsually ~100 200 bp long containingUsually ~100-200 bp long, containing multiple 8- to 20-bp control elements.T t f ti ifi d/ t lTargets for tissue specific and/or temporal regulation
Enhancer
Variable distance fromdistance from promoter
EitherEither orientation
Upstream or pdownstream of gene
TERMINATIONTERMINATION
• RNA polymerase meets the terminator• Terminator sequence: AAUAAA
• RNA polymerase releases from DNA• Prokaryotes-releases at termination
i lsignal• Eukaryotes-releases 10-35 base pairs
after termination signalafter termination signal
T i tiTerminationDiff t h i f t i ti• Different mechanisms of termination
• Prokaryotes– rho-independent termination: formation of arho independent termination: formation of a
hairpin structurerho dependent termination: external protein– rho-dependent termination: external protein disrupts transcription
• Eukaryotes– cleavage of the RNA by an external protein
Rho-independent terminator
Translation Start Site
Translation Stop Site
IntronsUTS
Start Site
UTS AATAAA
Stop Site
GT AG GT AG
EExonsTranscription Start Site
Transcription Stop Site
Flanking sequences
Start Site Stop Site
Prometor Enhancer Terminator
Distribution
• Different density of genes along a y g gchromosome
• Different density of genes between chromosomes
(exon-intron-exon)n structure of various genes
histone
total = 400 bp; exon = 400 bp
β-globin
p; p
HGPRT
total = 1,660 bp; exons = 990 bp
HGPRT(HPRT)
total = 42,830 bp; exons = 1263 bp
factor VIII
, p; p
t t l 186 000 b 9 000 btotal = ~186,000 bp; exons = ~9,000 bp
Gene product Size of gene (kb) Number of exons
Average size of exon (bp)
Average size of intron (bp)
tRNAtyr 0.1 2 50 20
Insulin 1.4 3 155 480
β-Globin 1.6 3 150 490
Class I HLA 3.5 8 187 260
Serum albumin 18 14 137 1100Serum albumin 18 14 137 1100
Type VII collagen 31 118 77 190
Complement C3 41 29 122 900
Phenylalanine hydroxylase
90 26 96 3500
Factor VIII 186 26 375 7100
CFTR (cystic fibrosis)
250 27 227 9100
Dystrophin 2400 79 180 30 000
GenesGenes
• Protein Coding
• RNA genes• RNA genes– rRNA– tRNA– snRNA, …N ,
”Average” gene organization• Single, unique genes consisting of exons
interrupted by introns onlyinterrupted by introns only
O i iOther gene organizations• Genes-within-genes
– It is not uncommon that short genes are i i i flocated inside an intron of another gene
Intron 26 of the NF1 gene containsIntron 26 of the NF1 gene contains three internal genes.
THE END