co 10. genome: the entire collection of genes encoded by a particular organism. determination of a...
TRANSCRIPT
CO 10
Genome:
The entire collection of genes encoded by a particular organism.
Determination of a entire genome sequence is a prerequisite to understanding the completebiology of an organism.
Structural: construction of sequence data and gene map.
Functional: functions of genes, and their regulation and products.
Comparative: compare genes from different genomes to elucidate functional and evolutional relationship.
Genomics:
1990: International Human Genome project begins.
1. To generate physical, genetic, and sequence map of the human genome.
2. To sequence the genome of a variety of model organisms.
3. To develop improved technologies for mapping and sequencing.
4. To develop computational tools for capturing, storing, analyzing, displaying, and distributing map and sequence information.
History
5. To sequence EST (expressed-sequence tag) fragments of cDNA, and eventually full-length cDNA in different cell
types of human and mice.
6. To consider the ethical, social, and legal challenges posted
by genomic information.
History
1990: International Human Genome project begins.
Fig. 10.1
What in this chapter?
• Challenges and strategies of genome analysis
• Major insights emerging from complete genome sequences
• High throughput tools for analyzing genome and their products.
Table 10.1
The genomes of living Organisms vary enormously in size
Sequences and polymorphisms
• Sequence error rate: 1% per sequence read Good genomic sequence errors: 1/10,000
Polymorphisms: 1/500 bp.
Repeated sequences may be hard to placeUnclonable DNA cannot be sequenced
Challenges and strategies of genome analysis
Fig. 10.2
A divide and conquer strategy
10-fold sequence coverage
Sequencing of every chromosomal region from 10 independent inserts can generate an error rate of less than 1/10000.
Random sequence error:1/10 sequence fragments
Polymorphisms: 5/10 sequence fragments
Major techniques in genome characterization
Cloning
hybridization
PCR amplification
sequencing
Computational tool
Three types of maps used in the analysis of human genome
• Linkage map (DNA markers)
• Physical map (divide and conquer)
• Sequence map
Human genome: 3X109
Fig. 10.3
The making of large-scale linkage maps
Two common types of polymorphisms used or mapping
DNA markers
(expand or contract during replication)
Genomewide identification of genetic markers
Identification of SSR by specific pairs of PCR primers
Human Linkage Map
• 20,000 SSRs, 4 million SNPs.
Fig. 10.4
In human: 1 cM= 1 MbIn mice: 1 cM= 2 Mb
Physical MapsOverlapping DNA fragments that are ordered and oriented
and span each of the chromosomes in a genome
The molecular counterparts of linkage maps
How to build the long-range physical maps:
Bottom-up and Top-Down approaches
A Hypothetical physical map generated by the analysis of sequence
tagged sites
STS: sequence tagged sites
Fig. 10.5
Dark band: gene poor, AT richLight band: gene rich, CG rich
metaphase
Chromosome 7 at three levels of banding resolution
Fig. 10.6
FISH (fluorescent in situ hybridization)
Advantages of FISH compared to linkage mapping
1. All clones can be mapped by FISH, but those that detect polymorphisms can be mapped by linkage analysis.
2. FISH can be done on any clone locus in isolation, but linkage requires the analysis of one locus in relation to another.
3. FISH requires only a single sample, linkage requires genotype information from a large cohort of individuals.
Disadvantages: low resolution, 4-8 Mb
A sequencing map is the highest-resolutiongenomic map
Hierarchical shotgun approach
Whole-genome shotgun approach
Fig. 10.12
Hierarchical shotgun approach
minimal overlappingBACs
10X coverage acrossThe BAC insert
200kbX10/2Kb=1000
Fig. 10.13
Whole genome shotgun approach
10-fold sequencecoverage
3X109X6/2000
Whole genome shotgun approach
Advantages: no construction of physical map.
Disadvantage: some genomic sequences can not be cloned.
The human genome project has changed the practice of Biology, genetics, and genomics
Gene finding and gene-function analyses:
•Through comparative genomics, Identification of genes and gene functions in second genome is facilitated by sequence homology.
•Genes often encodes one or more protein domains. These information provide insights into the functions of a protein.
Fig. 10.14
Fig. 10.15
Synteticblocks
Major insights from the Human and model organismgenome sequence
1. There are approximately 30,000 human genes. 2. Genes encodes either noncoding RNAs or proteins Non-coding RNAs: tRNA,tRNA,snoRNA (small nucleolar RNAs)snRNA (small nuclear RNAs)
3. Higher complexity of proteome in human: more genes,
more paralogous, alternative splicing.
Homologous genes: genes with enough sequence similarity to be evolutionarily related.
Orthologous genes: defined by their sequence similarities, are genes in two different species that arose from the same gene in the two species’ common ancestor. Paralogous genes: arise by duplication within the same species.
4. More Domain architecture:
Major insights from the Human and model organismgenome sequence
5. Chemical modification of proteins
• 400 different chemical modification
• 1 million different proteins
Major insights from the Human and model organismgenome sequence
6. Repeated sequences constitute more than 50% of the human genome.
Transposon-derived repeats, pseudogenes, or simple sequence repeats
Major insights from the Human and model organismgenome sequence
6. The genome contains distinct types of gene organization
A). gene family: multiple related genesolfactory gene family (1000 genes), histones, hemoglobins,
Fig. 10.19 Olfactory receptor gene family
1. One gene undergoes duplication to generate 20 paralogs.2. Massive duplication created 30 sites of the original 20-paralog family.
Fig. 10.20
B). Gene rich region 70% DNA is transcribed
C). Gene deserts
82 gene deserts: no identifiable gene within a megabase
60 genes/700 kb
Fig. 10.21
Combinational strategies may amplify geneticInformation and generate diversity
at DNA level
Antibody or T-cell receptor genes: VDJ recombination
Fig. 10.22
Combinational strategies may amplify geneticInformation and generate diversity
At the RNA level
High throughput genomic and proteomic platformspermit the global analysis of gene product
Fig. 10.23
Sanger sequencing scheme
DNA arrays
Macroarray: cDNA on nylon membrane
Microarray: PCR amplified product on glass-slide
Oligonuclotide array: chemically synthesized 20- 60 nt of DNA or RNA
Fig. 10.25
Normaltumor
Normal
tumor
Two-color DNA microarray
Fig. 10.27
Mass/chargeratios
Protein analyses
Fig. 10.28
MPSS: methods to identify transcriptome
(multiple parallel signature sequencing)
Fig. 10.31
Protein-protein interaction:affinity purification and mass spectrometry
Fig. 10.32
The yeast two-hybrid
System BiologyGlobal study of multiple components of biological
systems and their simultaneous interaction
System Biology approaches
1. Formulate a computer-based model based on current understanding.2. To define as many of the system’s element as possible by discovery science.3. Perturb the system either genetically or environmentally and
measure changes.
Fig. 10.33
Perturb the system and measure changes
Fig. 10.34
Fig. 10.35
4. Integrate the biological information, and compare these data against prediction of the model
5. Formulate hypothesis to explain disparities betweenexperimental data and the model, and use these hypothesis as the basis for a second round of perturbation
6. Refine the model until model and experiment are in accord with one another.
TABLES
Table 10.2a
Table 10.2b
Table 10.3
Fig. 10.9b
Fig. 10.11
Fig. 10.16
Fig. 10.26
Fig. 10.36
Fig. 10.29
Fig. 10.30
Fig. 10.7
Basic procedures in building a whole chromosome physical map