Download - The Human Genome Project-Babak Nami
-
8/3/2019 The Human Genome Project-Babak Nami
1/62
-
8/3/2019 The Human Genome Project-Babak Nami
2/62
The Human Genome ProjectThe Human Genome Project BabakBabak NamiNami Department of MedicalDepartment of Medical
GeneticsGenetics SelSelukuk UniversityUniversity
-
8/3/2019 The Human Genome Project-Babak Nami
3/62
Human GenomeHuman Genome The human genome is the genome of Homo sapiens, which isThe human genome is the genome of Homo sapiens, which is
stored on 23 chromosome pairs.stored on 23 chromosome pairs.
22 of these are22 of these are autosomalautosomal chromosome pairs, while thechromosome pairs, while theremaining pair is sexremaining pair is sex--determining.determining.
billion DNA basebillion DNA base pairespaires..
The haploid human genome contains ca.The haploid human genome contains ca. 23,000 protein23,000 protein--codingcodinggenesgenes, far fewer than had been expected before its sequencing., far fewer than had been expected before its sequencing.
In fact, only aboutIn fact, only about 1.5%1.5% of the genome codes for proteins,of the genome codes for proteins,while the rest consists ofwhile the rest consists ofnonnon--coding RNA genescoding RNA genes,, regulatoryregulatorysequencessequences intronsintrons,, andand noncodingnoncoding DNADNA (once known as "(once known as "junkjunkDNA")DNA")
-
8/3/2019 The Human Genome Project-Babak Nami
4/62
Human GenomeHuman Genome Information content of the haploid human genome by chromosome:Information content of the haploid human genome by chromosome:
Haploid means we only count one of each chromosome pair. For thisHaploid means we only count one of each chromosome pair. For thisreason, the total information content for a woman (XX) is less than for areason, the total information content for a woman (XX) is less than for aman (XY), where both the X and the Y are counted.man (XY), where both the X and the Y are counted.
-
8/3/2019 The Human Genome Project-Babak Nami
5/62
How much data make up the humanHow much data make up the humangenome?genome?
3 Bookcases with 403 Bookcases with 40BooksBooks
per bookcase x 5000per bookcase x 5000a esa es
per book x 5000 bases perper book x 5000 bases perpage = 3,000,000,000page = 3,000,000,000
bases!bases!
-
8/3/2019 The Human Genome Project-Babak Nami
6/62
Human Genome ProjectHuman Genome Project The Human Genome Project (HGP) is anThe Human Genome Project (HGP) is an
international scientific research project with ainternational scientific research project with aprimary goal of determining the sequence ofprimary goal of determining the sequence ofchemical base airs which make u DNA andchemical base airs which make u DNA andto identify and map the approximately 20,000to identify and map the approximately 20,00030,000 gene of the human genome from both a30,000 gene of the human genome from both a
physical and functional standpoint.physical and functional standpoint.
-
8/3/2019 The Human Genome Project-Babak Nami
7/62
HistoryHistory 1985.1985. Proposed.Proposed. 19881988. Initiated and funded by NIH and US Dept. of. Initiated and funded by NIH and US Dept. of
Energy ($3 billion set aside)Energy ($3 billion set aside)
19901990. Work begins.. Work begins... --project earlyproject early
2001.2001.Published in Science and Nature in February,Published in Science and Nature in February,
20022002. The quest for genome sequencing was being. The quest for genome sequencing was beingpursued simultaneously in over 20 laboratories in sixpursued simultaneously in over 20 laboratories in sixcountriescountries
20032003. the whole genome sequenced. the whole genome sequenced
-
8/3/2019 The Human Genome Project-Babak Nami
8/62
HistoryHistory
Initiative Office of HGP
-
8/3/2019 The Human Genome Project-Babak Nami
9/62
Human Genome Project Goals and Completion DatesHuman Genome Project Goals and Completion Dates
1994:
1998:Physical map
(3,000 markers)
2003:DNA sequence
(99% of gene-containing
2003:
3 millionmappedSNPs
1990 1995 2000 2005
Genetic map1 cM resolution(3,000 markers)
2003:15,000 full-length
cDNAs
-
8/3/2019 The Human Genome Project-Babak Nami
10/62
NIHNIH put the human genomeput the human genomesequence on the web July 7,sequence on the web July 7,
20002000 Cyber geeksSearched forhiddenMessages,
and
UCSC put the
human genomesequence on CD in
October 2000, with
varying results
-
8/3/2019 The Human Genome Project-Babak Nami
11/62
-
8/3/2019 The Human Genome Project-Babak Nami
12/62
The first printout of the human genome to beThe first printout of the human genome to bepresented as a series of books, displayed at thepresented as a series of books, displayed at theWellcomeWellcome CollectionCollection, London, London
-
8/3/2019 The Human Genome Project-Babak Nami
13/62
Goals of the ProjectGoals of the Project
identifyidentify all the approximately 30,000 genes inall the approximately 30,000 genes inhuman DNA,human DNA,
determinedetermine the sequences of the 3 billion chemicalthe sequences of the 3 billion chemical
base pairs that make up human DNA,base pairs that make up human DNA, storestore this information in databases,this information in databases, improveimprove tools for data analysis,tools for data analysis, transfertransferrelated technologies to the private sector,related technologies to the private sector,
andand addressaddress the ethical, legal, and social issues (ELSI)the ethical, legal, and social issues (ELSI)
that may arise from the project.that may arise from the project.
-
8/3/2019 The Human Genome Project-Babak Nami
14/62
-
8/3/2019 The Human Genome Project-Babak Nami
15/62
In BiotechnologyIn Biotechnology
Production of useful protein products for use inProduction of useful protein products for use inmedicine, agriculture, bioremediation andmedicine, agriculture, bioremediation and
pharmaceutical industries.pharmaceutical industries.
Protein replacement (factor VIII, TPA, streptokinase, insulin,Protein replacement (factor VIII, TPA, streptokinase, insulin,interferon)interferon)
BT insecticide toxin (fromBT insecticide toxin (fromBacillusBacillus thuringiensisthuringiensis))
Herbicide resistance (Herbicide resistance (glyphosateglyphosate resistance)resistance) BioengineeredBioengineered foods]foods] PharmPharm animals animals
-
8/3/2019 The Human Genome Project-Babak Nami
16/62
-
8/3/2019 The Human Genome Project-Babak Nami
17/62
ProteomicsProteomics
Investigates patterns and levels of geneInvestigates patterns and levels of geneexpression in diseased cells that can beexpression in diseased cells that can be
profiles.profiles.
-
8/3/2019 The Human Genome Project-Babak Nami
18/62
DNA Chip TechnologyDNA Chip Technology
-
8/3/2019 The Human Genome Project-Babak Nami
19/62
-
8/3/2019 The Human Genome Project-Babak Nami
20/62
-
8/3/2019 The Human Genome Project-Babak Nami
21/62
-
8/3/2019 The Human Genome Project-Babak Nami
22/62
InIn PharmacogenomicsPharmacogenomics
Investigates DNA mutations associated withInvestigates DNA mutations associated withdisease susceptibility and drug sensitivities.disease susceptibility and drug sensitivities.
ProdrugProdrug gene therapy for cancersgene therapy for cancers
-
8/3/2019 The Human Genome Project-Babak Nami
23/62
In DevelopmentalIn Developmental BiologyBiology
Regulation of embryonic development.Regulation of embryonic development.
Regulation of the aging processRegulation of the aging process..
..
Regulation of metabolism.Regulation of metabolism.
-
8/3/2019 The Human Genome Project-Babak Nami
24/62
Evolutionary and ComparativeEvolutionary and ComparativeBiologistsBiologists
Because DNA mutates at a constant rate,Because DNA mutates at a constant rate,comparisons of DNA between differentcomparisons of DNA between different
..
-
8/3/2019 The Human Genome Project-Babak Nami
25/62
Human Genome SequenceHuman Genome SequenceVariationVariation
Develop technologies for rapid, largeDevelop technologies for rapid, large--scalescale
identification and scoring of singleidentification and scoring of single--nucleotidenucleotide
polymorphisms and other DNA sequence variants.polymorphisms and other DNA sequence variants.
Identify common variants in the coding regions of theIdentify common variants in the coding regions of themajority of identified genes during this 5majority of identified genes during this 5--year period.year period.
Create a SNP map of at least 100,000 markers.Create a SNP map of at least 100,000 markers.
Develop the intellectual foundations for studies ofDevelop the intellectual foundations for studies of
sequence variation.sequence variation.
Create public resources of DNA samples and cellCreate public resources of DNA samples and cell
lines.lines.
-
8/3/2019 The Human Genome Project-Babak Nami
26/62
Model organismsModel organisms
Bacteria (Bacteria (E. coliE. coli, influenza, several others), influenza, several others)
Yeast (Yeast (Saccharomyces cerevisiaeSaccharomyces cerevisiae))
Plant (Plant (Arabidopsis thalianaArabidopsis thaliana)) Roundworm (Roundworm (Caenorhabditis elegansCaenorhabditis elegans))
Fruit fly (Fruit fly (Drosophila melanogasterDrosophila melanogaster))
Mouse (Mouse (Mus musculusMus musculus))
-
8/3/2019 The Human Genome Project-Babak Nami
27/62
How does the human genomeHow does the human genome
stack up?stack up?
Organism Genome Size (Bases) Estimated Genes
Human (Homo sapiens) 3 billion 30,000
Laboratory mouse (M. musculus) 2.6 billion 30,000
Mustard weed (A. thaliana) 100 million 25,000
Roundworm (C. elegans) 97 million 19,000
Fruit fly (D. melanogaster) 137 million 13,000
Yeast (S. cerevisiae) 12.1 million 6,000Bacterium (E. coli) 4.6 million 3,200
Human immunodeficiency virus (HIV) 9700 9
-
8/3/2019 The Human Genome Project-Babak Nami
28/62
AAGTTC CTAAGC ATTCGG
AAGTTC CTAAGC
AAGTTC
-
8/3/2019 The Human Genome Project-Babak Nami
29/62
-
8/3/2019 The Human Genome Project-Babak Nami
30/62
Practical GoalsPractical Goals
-
8/3/2019 The Human Genome Project-Babak Nami
31/62
http://www.genome.gov/Pages/News/PaceofDiseaseGeneDiscovery.pdf
-
8/3/2019 The Human Genome Project-Babak Nami
32/62
-
8/3/2019 The Human Genome Project-Babak Nami
33/62
Sequencing StrategySequencing Strategy Once a contig map of the genome wasOnce a contig map of the genome was
obtained, it was necessary to sequenceobtained, it was necessary to sequenceeach individual clone.each individual clone.
Most of the actual human genomeMost of the actual human genomesequencing was done on BAC clones,sequencing was done on BAC clones,which are less prone to rearrangement thanwhich are less prone to rearrangement thanYAC clones. BACs are about 100YAC clones. BACs are about 100--200200
kbp long.kbp long.
shotgun sequencingshotgun sequencing: The large cloned: The large clonedDNA is randomly broken up into a seriesDNA is randomly broken up into a seriesof small fragments ( less than 1 kb).of small fragments ( less than 1 kb).These fragments are cloned andThese fragments are cloned andsequenced. A computer program thensequenced. A computer program thenassembles them based on overlapsassembles them based on overlaps
between the sequences of each clone.between the sequences of each clone. To ensure that every bit has been covered,To ensure that every bit has been covered,
you need to sequence random clones untilyou need to sequence random clones untilyou have covered each spot 5you have covered each spot 5--10 times on10 times onaverage.average.
-
8/3/2019 The Human Genome Project-Babak Nami
34/62
-
8/3/2019 The Human Genome Project-Babak Nami
35/62
Sequencing: BACSequencing: BAC--based methodbased method
Each clone 150-200,000 bp
Cloned in bacteria ,
BAC clones mappedclones
subclones
Subclones 2,000 bp
Sequenced 10 times in 500 800 bpsegments
Subclone sequences re-assembled
-
8/3/2019 The Human Genome Project-Babak Nami
36/62
-
8/3/2019 The Human Genome Project-Babak Nami
37/62
Sequencing Technologies
The two basic sequencing approaches, Maxam-Gilbertand Sanger, differ primarily in the way the nested DNAfragments are produced.
Maxam-Gilbert sequencing (also called the
chemical degradation method) uses chemicals to cleave,
lengths. A refinement to the Maxam-Gilbert method knownas multiplex sequencing enables investigators to analyzeabout 40 clones on a single DNA sequencing gel.
Sanger sequencing (also called the chain termination ordideoxy method) involves using an enzymatic procedureto synthesize DNA chains of varying length in fourdifferent reactions, stopping the DNA replication atpositions occupied by one of the four bases, and then
determining the resulting fragment lengths.
-
8/3/2019 The Human Genome Project-Babak Nami
38/62
-
8/3/2019 The Human Genome Project-Babak Nami
39/62
-
8/3/2019 The Human Genome Project-Babak Nami
40/62
Advanced TechniquesAdvanced Techniques
SOLiDSOLiD SequencingSequencing
HelicosHelicos High speed Gene SequencingHigh speed Gene Sequencing
Laser SequencingLaser Sequencing
-
8/3/2019 The Human Genome Project-Babak Nami
41/62
-
8/3/2019 The Human Genome Project-Babak Nami
42/62
How the Code was DecodedHow the Code was Decoded
DoubleTwistDoubleTwist Inc, an application service provider (ASP),Inc, an application service provider (ASP),devoted to empower life scientists, completed the firstdevoted to empower life scientists, completed the firstannotationannotation of the human genome.of the human genome.
TheThe DoubleTwistDoubleTwist human genome database was createdhuman genome database was created,,is, a total of more thanis, a total of more than 350 processors350 processors..
It brought to a close an extensive analysis of the availableIt brought to a close an extensive analysis of the availableHGP data that revealed genes and other valuableHGP data that revealed genes and other valuableinformation. The task was accomplished using Suninformation. The task was accomplished using SunEnterprise supercomputers, includingEnterprise supercomputers, including StarfireStarfire servers.servers.
-
8/3/2019 The Human Genome Project-Babak Nami
43/62
Genome MapGenome Map
A genome map describes the order of genesor other markers and the spacing betweenthem on each chromosome. Human genome
maps are constructed on several differentscales or levels of resolution.
-
8/3/2019 The Human Genome Project-Babak Nami
44/62
Genetic MapGenetic Map
Genetic linkage maps of eachchromosome are made bydetermining how frequently twomarkers are passed together
from parent to child. Becauseexchanged during theproduction of sperm and eggcells, groups of traits (ormarkers) originally together onone chromosome may not beinherited together.
-
8/3/2019 The Human Genome Project-Babak Nami
45/62
-
8/3/2019 The Human Genome Project-Babak Nami
46/62
. . . t o a mu l t i. . . toamul t i --resolution view . . .resolution view . . .
-
8/3/2019 The Human Genome Project-Babak Nami
47/62
. . . at the gene cluster level . . .. . . at the gene cluster level . . .
-
8/3/2019 The Human Genome Project-Babak Nami
48/62
. . . the single gene level . . .. . . the single gene level . . .
-
8/3/2019 The Human Genome Project-Babak Nami
49/62
. . . and at the single base level. . . and at the single base level
caggcggactcagtggatctggccagctgtgacttgacaag
caggcggactcagtggatctagccagctgtgacttgacaag
-
8/3/2019 The Human Genome Project-Babak Nami
50/62
-
8/3/2019 The Human Genome Project-Babak Nami
51/62
-
8/3/2019 The Human Genome Project-Babak Nami
52/62
-
8/3/2019 The Human Genome Project-Babak Nami
53/62
XX--ray hybrid mappingray hybrid mapping
XX--ray hybrids are made by irradiating a human cell line with 3000 radray hybrids are made by irradiating a human cell line with 3000 radof Xof X--rays, fusion to hamster cells, and isolation of hybrid cell lines inrays, fusion to hamster cells, and isolation of hybrid cell lines incultureculture
A panel of 100A panel of 100--200 hybrids with 5200 hybrids with 5--10 different fragments of human10 different fragments of humanDNA in each gives about 1000 fragments in total, i.e. the humanDNA in each gives about 1000 fragments in total, i.e. the humangenome has been divided into 1000 bits.genome has been divided into 1000 bits.
,,that they will be present in the same hybrids (since they are less likelythat they will be present in the same hybrids (since they are less likelyto be separated by an Xto be separated by an X--ray induced break).ray induced break).
By doing a PCR assay for each marker on all the hybrids, a map can beBy doing a PCR assay for each marker on all the hybrids, a map can bemade. The units are called cR (centiray, where 1cR is a 1% chance thatmade. The units are called cR (centiray, where 1cR is a 1% chance thatthe markers will be separated by Xthe markers will be separated by X--ray breakage).ray breakage).
-
8/3/2019 The Human Genome Project-Babak Nami
54/62
-
8/3/2019 The Human Genome Project-Babak Nami
55/62
For each pair of markers in turn the "co-retention frequency" isthe number of hybrids in which both markers are present,divided by the number of hybrids in which one or other (or
both) markers are present. On the figure, there are 5 hybridscontaining both markers B and C, and 6 containing B and/or C.Therefore the co-retention frequency is 5/6 or 0.83. Likewise it
is 6/7 for markers E and F, and 2/10 for markers C and E. Thisshows that B and C are close to ether E and F are close
-
8/3/2019 The Human Genome Project-Babak Nami
56/62
CloneClone contigscontigs
A clone contig is a series of cloned DNAA clone contig is a series of cloned DNAsegments that overlap each other, assembledsegments that overlap each other, assembledin the correct order along the genomein the correct order along the genome
cosmids (capacity 45 kb)cosmids (capacity 45 kb) BACs or YACs (Bacterial or Yeast ArtificialBACs or YACs (Bacterial or Yeast Artificial
Chromosomes) which can clone 100s of kb ofChromosomes) which can clone 100s of kb ofDNADNA -- more suitable for dealing with largemore suitable for dealing with largestretches of mammalian DNA.stretches of mammalian DNA.
M ki l ti b fi i ti
-
8/3/2019 The Human Genome Project-Babak Nami
57/62
Making a clone contig by fingerprinting
-
8/3/2019 The Human Genome Project-Babak Nami
58/62
What does the draft human genomesequence tell us?
By the Numbers
The human genome contains 3 billion chemical nucleotide bases (A, C, T,and G).
, ,largest known human gene being dystrophin at 2.4 million bases.
The total number of genes is estimated at around 30,000--much lower thanprevious estimates of 80,000 to 140,000.
Almost all (99.9%) nucleotide bases are exactly the same in all people.
The functions are unknown for over 50% of discovered genes.
What does the draft human genome
-
8/3/2019 The Human Genome Project-Babak Nami
59/62
What does the draft human genomesequence tell us?
How It's Arranged
The human genome's gene-dense "urban centers" are predominantly composed ofthe DNA building blocks G and C.
In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T.
GC- and AT-rich regions usually can be seen through a microscope as light and.
Genes appear to be concentrated in random areas along the genome, with vastexpanses of noncoding DNA between.
Stretches of up to 30,000 C and G bases repeating over and over often occuradjacent to gene-rich areas, forming a barrier between the genes and the "junkDNA." These CpG islands are believed to help regulate gene activity.
Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest(231).
h d h d f h
-
8/3/2019 The Human Genome Project-Babak Nami
60/62
What does the draft human genomesequence tell us?
The Wheat from the Chaff
Less than 2% of the genome codes for proteins.
Repeated sequences that do not code for proteins ("junk DNA").
Repetitive sequences are thought to have no direct functions, butthey shed light on chromosome structure and dynamics.
The human genome has a much greater portion (50%) of repeatsequences than the mustard weed (11%), the worm (7%), and thefly (3%).
What does the draft human genome
-
8/3/2019 The Human Genome Project-Babak Nami
61/62
What does the draft human genomesequence tell us?
How the Human Compares with Other Organisms
Unlike the human's seemingly random distribution of gene-rich areas, many otherorganisms' genomes are more uniform, with genes evenly spaced throughout.
Humans have on average three times as many kinds of proteins as the fly or wormbecause of mRNA transcript "alternative splicing" and chemical modifications to the
proteins. This process can yield different protein products from the same gene. Humans share most of the same protein families with worms, flies, and plants; but the
number of gene family members has expanded in humans, especially in proteinsinvolved in development and immunity.
Although humans appear to have stopped accumulating repeated DNA over 50 million
years ago, there seems to be no such decline in rodents. This may account for some ofthe fundamental differences between hominids and rodents, although gene estimates aresimilar in these species. Scientists have proposed many theories to explain evolutionarycontrasts between humans and other organisms, including those of life span, litter sizes,inbreeding, and genetic drift.
-
8/3/2019 The Human Genome Project-Babak Nami
62/62
What does the draft human genomesequence tell us?
Variations and Mutations
Scientists have identified about 3 million locations where single-baseDNA differences (SNPs) occur in humans. This information promises to
revolutionize the processes of finding chromosomal locations for disease-assoc a e sequences an rac ng uman s ory.
The ratio of germline (sperm or egg cell) mutations is 2:1 in males vsfemales. Researchers point to several reasons for the higher mutation ratein the male germline, including the greater number of cell divisions
required for sperm formation than for eggs.