human genome project and its c h a p t e r …...human genome project and its ethical issues 2 c h a...

17
Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort, which formally initiated in October 1990. The first idea of Human Genome Project came from the discussions held during scientific meetings, which were organized by the US department of energy and other scientific organizations between 1984 and 1986. The project was planned, spanning a period of 15 years, but rapid technological advances accelerated its completion within 13 years, i.e., in 2003. The three billion US dollars funds were earmarked for the sequencing of more than two meters length of human DNA. The goal of the project was to determine the complete sequence of the three billion (3 × 10 9 ) DNA subunits (bases), identify all human genes, and make them accessible for further biological study. As a part of the HGP, parallel sequencing was done for selected model organisms, such as the bacterium E. coli to help develop the technology and interpret human gene function. The Department of Energy’s ‘Human Genome Program (HGP)’ and the National Institutes of Health’s ‘National Human Genome Research Institute (NHGRI)’ together sponsored the US Human Genome Project. Ari Patrinos, head of the Office of Biological and Environmental Research, directed the Department of Energy’s ‘Human Genome Program’ research. Francis Collins directed the National Institutes of Health’s, National Human Genome Research Institute efforts. The Corporate Genome Project was initiated rather late by Celera Genomics, a company founded by a former NIH scientist, Craig Venter, and funded by Perkin-Elmer, a large instrumentation manufacturer that makes and sells instruments to the government and to the private sector as well. The Human Genome Project (HGP) and the Corporate Genome Project (CGP) are two very distinctly different entities having different cultures and attitudes. The focus has been around the privacy issue. Celera wanted to retain some of the information or control over the information that it was going to publish about the human genome, because, in fact, its business model was dependent on that fact. From the inception of this project, due to the huge budget in sequencing of human DNA, many laboratories around the United States received

Upload: others

Post on 27-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and itsEthical Issues

2C H A P T E R

2.1. INTRODUCTION

The Human Genome Project (HGP) is an 13-yeareffort, which formally initiated in October 1990.The first idea of Human Genome Project camefrom the discussions held during scientificmeetings, which were organized by the USdepartment of energy and other scientificorganizations between 1984 and 1986. The projectwas planned, spanning a period of 15 years, butrapid technological advances accelerated itscompletion within 13 years, i.e., in 2003. The threebillion US dollars funds were earmarked for thesequencing of more than two meters length ofhuman DNA. The goal of the project was todetermine the complete sequence of the threebillion (3 × 109) DNA subunits (bases), identifyall human genes, and make them accessible forfurther biological study. As a part of the HGP,parallel sequencing was done for selected modelorganisms, such as the bacterium E. coli to helpdevelop the technology and interpret human genefunction. The Department of Energy’s ‘HumanGenome Program (HGP)’ and the NationalInstitutes of Health’s ‘National Human GenomeResearch Institute (NHGRI)’ together sponsored

the US Human Genome Project. Ari Patrinos, headof the Office of Biological and EnvironmentalResearch, directed the Department of Energy’s‘Human Genome Program’ research. FrancisCollins directed the National Institutes of Health’s,National Human Genome Research Instituteefforts.

The Corporate Genome Project was initiatedrather late by Celera Genomics, a companyfounded by a former NIH scientist, Craig Venter,and funded by Perkin-Elmer, a largeinstrumentation manufacturer that makes and sellsinstruments to the government and to the privatesector as well.

The Human Genome Project (HGP) and theCorporate Genome Project (CGP) are two verydistinctly different entities having different culturesand attitudes. The focus has been around theprivacy issue. Celera wanted to retain some of theinformation or control over the information that itwas going to publish about the human genome,because, in fact, its business model was dependenton that fact.

From the inception of this project, due to thehuge budget in sequencing of human DNA, manylaboratories around the United States received

Page 2: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 17

determine, among other things, how the organismlooks, how well its body metabolizes food or fightsinfection, and sometimes even how it behaves. Thehuman genome is made up of DNA (which hasfour different chemical building blocks).

DNA is made up of four similar chemicals(called bases and abbreviated A, T, C and G) thatare repeated millions or billions of timesthroughout a genome. The human genome, forexample, has 3 billion pairs of bases.

In DNA, the particular order of As, Ts, Cs andGs is extremely important. The order underliesthe life’s diversity, even dictating whether anorganism is human or another species, such asyeast, rice, or fruit fly, all of which have their owngenomes and are themselves the focus of genomeprojects. Since all organisms are related throughsimilarities in DNA sequences, insights gainedfrom nonhuman genomes often lead to newknowledge about human biology. To get an ideaof the size of the human genome present in eachof our cells, consider the following analogy: If theDNA sequence of the human genome werecompiled in books, the equivalent of 200 volumesthe size of a telephone book (at 1000 pages each)would be needed to hold it all (Fig. 2.1).

Storing all this information is a great challengefor computer experts known as bioinformaticsspecialists. One million bases (called a mega baseand abbreviated Mb) of DNA sequence data isroughly equivalent to 1 megabyte of computer datastorage space. Since the human genome is 3 billionbase pairs long, 3 gigabytes of computer datastorage space is needed to store the entire genome.This includes only nucleotide sequence data, anddoes not include data annotations and otherinformation that can be associated with sequencedata.

As time goes on, more annotations will beentered as a result of laboratory findings, literaturesearches, data analyses, personal communications,automated data-analysis programs, and autoannotators. These annotations associated with thesequence data are likely to dwarf the amount of

storage space actually taken up by the initial3 billion nucleotide sequence. Of course, that’s notmuch of a surprise because the sequence is merelya starting point for a much deeper biologicalunderstanding.

Human beings are also similar to other livingcells in their basic cell characteristics.

Cells: These are the fundamental workingunits of every living system. All the instructions

Fig. 2.1. Compiling the DNA sequence from thehuman genome into books would require

200 volumes, each the size of the 1,000 pageBangalore telephone book.

Page 3: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 19

Fig. 2.2. Human genome and nature of DNA.

blocks, G and C. In contrast, the gene-poor“deserts” are rich in the DNA building blocks, Aand T. GC- and AT-rich regions can usually be seenthrough a microscope as light and dark bands onchromosomes.

Genes appear to be concentrated in randomareas along the genome, with vast expanses of non-coding DNA in between.

Stretches of upto 30,000 C and G basesrepeating over and over often occur adjacent togene-rich areas, forming a barrier between thegenes and the “junk DNA”. These CpG islandsare believed to help regulate gene activity.Chromosome 1 has most of the genes (2968), andthe Y chromosome has the fewest (231).

Scientists have identified about 1.4 millionlocations where single-base DNA differences i.e.,single nucleoride polymorphisms (SNPs) occurin humans. This information promises torevolutionize the processes of findingchromosomal locations for disease-associatedsequences and tracing human history.

2.4. GENOME SEQUENCINGGenome sequencing is the term used to describethe laboratory process of reading the order of thefour letters of the genetic alphabet (A, C, G, T)along a strand of DNA. The steps involved in suchefforts are as follows:1. Selection of suitable sample materials.2. Isolation of DNA from the cells, and

preparation of large samples of high qualityDNA from these cells.

3. Cutting the purified DNA at random sites intoa manageable size, overlapping pieces of theDNA sample.

4. Insertion of these DNA pieces into packagesfor the production of unlimited copies of suchselected DNA.

5. Recording the order of bases for each DNAsample piece by using DNA sequencingtechniques.

6. Determination of the overlap of each piece,and assembling the sequences to give the finalgenome of the human.

Page 4: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

20 Bioethics and Biosafety

While following the above approaches, it isnecessary to make appropriate sample populationsbased on the distribution of humans. A primarygoal of the Human Genome Project is to generatedetailed maps of the human genome. These mapswill aid in determining the location of genes withinthe human genome. More specifically, they willassign genes to their chromosomes. Two types ofmaps are being generated genetic linkage mapsand physical maps.

Genetic linkage maps determine the relativearrangement and approximate distances betweengenes and markers on the chromosomes andphysical maps specify the physical location (in basepairs) and distance between genes or DNAfragments with unknown functions that are mappedto specific regions of the chromosomes.

Maps have different levels of resolution,ranging from low to high. The degree of resolutionthat is appropriate depends on whether, forexample, a large fragment of DNA is to be studiedor a more detailed picture of a small DNA regionis needed. A human genomic library consists ofrandom DNA fragments, and is used to establishsets of ordered, overlapping cloned DNAfragments or contigs for each chromosome of thegenome, In other words, these are high-resolutionmaps.

After mapping is complete, the DNA must besequenced to determine the order of all thenucleotide bases of the chromosomes, and thegenes in the DNA sequence must be identified. Inall phases of the project, a major focus has been ondeveloping instrumentation to increase the speedof data collection and analysis. New, automatedtechnologies are significantly increasing the speedand accuracy of DNA sequencing, whiledecreasing the cost. Software and database systemsmanage the data generated from mapping andsequencing projects. Database managementsystems store and aid in distributing genomicinformation (Fig. 2.3).

Genetic linkage maps show the order andgenetic distance between pairs of linked genes, that

is, genes on the same chromosome that determinevariable phenotypic traits (the difference betweengenetic distance and physical distance is explainedbelow). Genetic linkage maps enable thegeneticists to follow the inheritance of specifictraits (that is, genes) as they are passed fromgeneration to generation within the families.

Linkage maps also determine the arrangementof genes or markers with unknown functions onthe chromosomes. They show the order of linkedgenes and pairwise distances between their loci.During meiosis, as the haploid egg and sperm cellsform, homologous chromosomes (maternally andpaternally derived) line up, and DNA segmentscan be exchanged between the homologs. The newcombinations of alleles result from this process ofhomologous recombination. During meiosis, eachhuman chromosome pair is involved, on anaverage, in 1.5 crossover events. The likelihoodof crossing over increases as the distance betweenthe two loci increases. Crossing over between twogenes or markers on the same chromosome cansometimes occur if there is enough distancebetween them. If two genes are very close, theyare “linked” and recombination is unlikely to occurbetween them. Thus, the frequency ofrecombination is a quantitative index of the lineardistance between two genes on a genetic linkagemap. Distances are measured in centimorgans(cM), named after the famous geneticist, ThomasHunt Morgan. If genes (for example, A and B) areseparated by recombination 1% of the time, thatis, if one out of 100 products of meiosis isrecombinant, they are 1 cM apart. A geneticdistance of 1 cM represents a physical distance ofapproximately one million base pairs (1 Mb).Genetic maps are very powerful. An inheriteddisease gene can be located on the map if a secondgene or DNA reference marker is also inherited inindividuals with the disease, but is not found inindividuals who do not have that disease. Exactchromosomal locations have already been foundfor many disease genes, including fragile Xsyndrome, cystic fibrosis and Buntington’s disease.

Page 5: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 21

Fig. 2.3. Process of determination of DNA sequence from human chromosome.

Page 6: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

22 Bioethics and Biosafety

Many inherited diseases are caused by singlegenes, and thus can be studied by genetic linkageanalysis. Almost 5,000 genetic disorders have beenstudied in this way. These maps, however, do notrelate directly to the physical structure of DNA,and the gene of interest cannot be isolated on thebasis of information from genetic linkage mapsalone or human genome mapping. Linkageanalysis involves the study of family memberscarrying a particular trait for an inherited disorder.Often, several generations of one family are studiedto obtain enough information with which to inferlinkage. Some family members must express thetrait (gene) or genetic disorder, and the trait mustvary among individuals (that is, there must bedifferent alleles or forms of the gene). Analysisalso requires that there be individuals who areheterozygous for DNA reference markers or whohave a second gene linked to the gene in question.Heterozygous family members (members carryingtwo different forms of the trait or gene—onedominant and one recessive allele) enablegeneticists to determine which chromosome of thehomologous pair carries the allele for the geneticdisorder, and whether it is passed on to theoffspring.

The physical location of the DNA marker ona chromosome can then be found by using themarker sequence as a DNA probe. PolymorphicDNA markers serve as reference points orlandmarks to help find a region of DNA thatcontains the gene of interest. If a gene is foundbetween two DNA markers, the DNA region canbe isolated for further study.

An early goal of the investigators of HumanGenome Project was to generate linkage maps withpolymorphic DNA markers, spaced 2 to 5 cMalong each chromosome. This goal was reachedin 1995. Such a map helps scientists to find genesof interest relative to about 1,500 markers withinthe genome, Once linkage maps have some 3,300polymorphic DNA markers, each separated by only1 cM, gene hunting will be much easier. Thus, forpolymorphic DNA markers to be valuable, their

linkage with a gene must be established, and theirphysical locations must be identified through theuse of probes. Several large scientific groupsworking on the human genome are identifyingmarkers to generate comprehensive genetic maps.

2.5. PHYSICAL MAPS

The physical maps specify the exact physicallocation (in base pairs) and distance between genesor markers, or unknown DNA or genes. Thesemaps provide information about the physicalorganization of the DNA; examples are thelocation of restriction enzyme sites and the orderof restriction fragments of chromosomes. An entiregenome can be studied using a library of genomicDNA. These clones are uncharacterized, randomfragments and are not placed in order, as theywould be on the chromosome.

As the human genome is very large, largeDNA fragments must be cloned into vectors tomaintain manageable number of clones in thelibrary. Yeast artificial chromosomes (YACs) arebeing used as cloning vectors for the humangenome, since a DNA can be up to one millionbase pairs in length. Human DNA is attached tothe yeast DNA and transferred into yeast host cellsfor replication. Only a small portion of the yeast’stotal DNA, i.e., origin of replication, telomere, andcentromere is required for replication, so most ofthe YAC DNA is the foreign DNA.

The average insert used in YAC libraries is200,000 0 400,000 base pairs in length. This rangeis 10 times larger than inserts used in otherlibraries, such as for bacteriophage and cosmids,where up to 20,000 to 40,000 base pairs,respectively, can be cloned. The human genomecan be represented by 7,500 YAC clones, and ismaintained and amplified in yeast host cells. YACsand their inserts are cut into smaller fragments andrecloned or subcloned (for example, into cosmids),so that a detailed map of a YAC clone is obtained.YAC clones are screened by PCR to isolatespecific genes of interest. DNA inserts are also

Page 7: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 23

analyzed by obtaining restriction maps, identifyingpolymorphic markers, and/or DNA sequencing.However, without an ordered physical map, i.e.,one that refers to actual physical distances in basepairs between landmarks, the location of particularclones cannot be identified.

Another method called fluorescence in situhybridization (FISH) of probes to metaphasechromosomes provides information forconstructing low-resolution chromosomal maps.Chromosomal maps are actual physical mapsbecause distances are measured in base pairs.Metaphase chromosomes are spread out on amicroscope slide, and a solution containing afluorescent-tagged DNA probe is added. Under theappropriate conditions, the probe hybridizes to itsDNA complement on the chromosome and isdetected with a fluorescent microscope (Figs. 2.4& 2.5). The relative orientation of genes and DNAfragments can be assigned to specificchromosomes, and the gaps between mapped

cosmids can be bridged. Chromosomal mappingis used to locate genetic markers that are associatedwith observable traits.

Another type of physical map is the cDNAmap, which localizes coding regions (for exampleexons) to specific chromosome regions or bands.The cDNA molecules are synthesized from anmRNA template. The DNA map is probably oneof the most important types of map, since it canidentify the chromosomal location of specificgenes, whether their functions are known or not.Researchers searching for a specific diseasecausing gene can use cDNA maps to help locate itafter having established a general location bygenetic linkage methods.

High-resolution physical maps can begenerated by a method that is sometimes calledbottom-up mapping. The chromosome is cut intosmall overlapping fragments, each of which iscloned and the order determined. These fragmentsform continuous DNA blocks called contigs. Thebottom-up method generates a detailed map calleda ‘contig’ map. A library of clones ranging from10,000 base pairs to 1 Mb is used for mapping.Each clone can be localized to specific regionswithin chromosomal bands. This “linked” libraryof overlapping clones comprises a chromosomalsegment.

The production of human contig maps requiresseveral steps. First, a library must be made that

Fig. 2.4. A microscopic preparation (metaphasesquash before a karyotype is made) of human

chromosomes showing the differences in size andbanding patterns of the chromosomes.

Fig. 2.5. Fluorescence of chromosome position byprobes in fluorescence in situ hybridization.

Page 8: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

24 Bioethics and Biosafety

represents the human genome—either the entiregenome or a segment—in cloned DNA fragments.The DNA fragments within each clone mustoverlap other fragments. Overlap is accomplishedby cutting the DNA with a specific restrictionenzyme. If every restriction site on the DNA werecut, fragments would not overlap. Therefore,enzyme digestion is conducted in such a way thatonly a particular DNA restriction site is cut. Thispartial digestion randomly leaves many sites uncut,so that overlapping DNA fragments are producedand the order along the chromosomes can bedetermined.

The order of the clones or contigs can bedetermined by identifying the overlaps in the DNAfragments. Overlap can be detected when some ofthe DNA bands are the same i.e., two clones havebands in common. This method of assembling pairsof clones into contigs is difficult and time-consuming.

Automation and sophisticated computeralgorithms may increase the efficiency. Differentapproaches may be used to fill in the gaps that arelikely to be present even after researchers generatedetailed physical maps. For example, micro-dissection, which is used to physically cut a pieceof DNA from a specific region of a chromosome.This chromosomal piece can be cut into smallerfragments by restriction enzymes, cloned, mapped,and sequenced by standard methods.

An alternate method is “chromosomewalking”, in which a small region at the end ofthe DNA fragment is used as a probe to screen thelibrary for the adjacent clone. A DNA piece at theend of this second cloned fragment is used as anext probe. This process continues until a completephysical map has been obtained. Since the humangenome is divided into chromosomes, chromosomespecific libraries can be constructed so that eachchromosome has a contig map.

Mapping is simplified if each chromosome isseparated from the others before being cut byrestriction enzymes and cloned to make libraries.Twenty-four libraries are required: 22 autosomal

libraries, and one each for the X and Ychromosomes. The several types of maps rangefrom coarse to fine resolution.

The map with the lowest resolution is thegenetic map, which measures the frequency ofrecombination between linked markers (which canbe genes or noncoding DNA). The next level ofresolution is the restriction map, on which DNArestriction fragments ranging from 1 to 2 Mb areseparated and mapped. The next higher level ofresolution is achieved by placing in order 400,000to 1,000,000 base pair fragments of overlappingclones from libraries of YAC clones. These clonesare then further subcloned (with insert sizes of20,000 to 40,000 base pairs) into other vectors toproduce contig maps. Finally, the DNA basesequence map having the finest resolution isdetermined.

Sequence-Tagged Sites

In the sequencing approaches, Human GenomeProject requires that the collected genomeinformation be shared. A major problem is thatinvestigators from different laboratories use avariety of methods for generating and mappingDNA fragments, thus making correlations difficultwhen data from different laboratories arecompared. Therefore, to solve this problem,universal reference system has been developed.Unique regions of 200 to 500 base pairs of partiallysequenced DNA are used to identify clones,contigs, and long stretches of DNA. Thesesequence-tagged sites (STSs) are standard markersthat are used for physical mapping. An STS canalso can be a region of cDNA i.e., an exp sequencecalled an expressed-sequence tag (EST). ESTs areused to represent landmarks along the map, thushelping to identify the regions where pairs ofclones overlap.

These special sequences constitute a “universalmapping language”, enabling everyone to refer toa specific region of the genome by the same name,and enabling investigators to share information and

Page 9: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

26 Bioethics and Biosafety

humans, especially in proteins involved indevelopment and immunity.

The human genome has a much greater portion(50%) of repeat sequences than the mustard weed(11%), the worm (7%) and the fly (3%).

Although humans appear to have stoppedaccumulating repeated DNA over 50 million yearsago, there seems to be no such decline in rodents.This may account for some of the fundamental diffe-rences between hominids and rodents, though geneestimates are similar in these species. Scientistshave proposed many theories to explainevolutionary contrasts between humans and otherorganisms, including those of life span, litter sizes,inbreeding, and genetic drift.

Variations and Mutations

US Human Genome Project Research Goals

The completion of the human DNA sequence inthe spring of 2003 coincided with the 50thanniversary of Watson and Crick’s description ofthe fundamental structure of DNA. The analyticalpower arising from the reference DNA sequencesof entire genomes and other genomics resourceshas jump-started, what some call the “biologycentury”.

The Human Genome Project was marked byaccelerated progress. In June, 2000, the rough draftof the human genome was completed a year beforeits schedule time. In February 2001, special issuesof Science and Nature contained the working draftsequence and analyses were published.

The project’s first 5-year plan, intended toguide research in financial years 1990-1995, wasrevised in 1993 due to unexpected progress, andthe second plan outlined goals through the FY,1998. The third and final plan (Science, 23 October1998) was developed during a series of DOE andNIH workshops. Some 18 countries haveparticipated in the worldwide effort, withsignificant contributions from the Sanger Centerin the United Kingdom, and research centers inGermany, France and Japan.

Difference Between Draft Sequence andFinished Sequence

To generate the high-quality reference sequence,completed in April 2003, an additional sequencingwas done to close the gaps and reduce theambiguities. Further, only a single error wasallowed for every 10,000 bases, the agreed-uponstandard for the HGP. Investigators believe that ahigh-quality sequence is critical for recognizingregulatory components of genes that are veryimportant in understanding human biology anddisorders such as heart disease, cancer, anddiabetes. The genomes have been sequencedcompletely as shown in the Table 2.2.

The small genomes of several viruses andbacteria, and the much larger genomes of threehigher organisms have been completely sequenced.They are bakers’ or brewers’ yeast (Saccharomycescerevisiae), the roundworm (Caenorhabditiselegans) and the fruit fly (Drosophilamelanogaster). In October, 2001, the draftsequence of the pufferfish Fugu rubripes, the firstvertebrate after the human, was completed, andscientists finished the first genetic sequence of aplant weed Arabidopsis thaliana, in December2000. Many more genomes have been sequencedsince then.

Human Genome project is also called HumanGenome Initiative Scientific Research Effort toanalyze the DNA of humans and of several lowerorganisms. The project began in the United Statesin 1990 under the sponsorship of the USDepartment of Energy and the National Institutesof Health. Projects undertaken concurrently inJapan, the United Kingdom, Italy, France, andRussia are coordinated with the American effortthrough the Human Genome Organization.

The ultimate goal of the project is to identifythe chromosomal location of every human gene,and to determine the precise chemical structure ofeach gene in order to elucidate its function in healthand disease. The information gathered is expectedto serve as the basic reference for research inhuman biology and medicine in the 21st century,

Page 10: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 27

and to provide fundamental insights into thegenetic basis of human diseases. The newtechnologies developed in the course of the projectwill be applicable in numerous other fields ofbiomedical endeavour.

Each cell of an organism has a set ofchromosomes containing the heritable geneticmaterial that directs its development, i.e., itsgenome. The genetic material of chromosomes isDNA. Each of the paired strands of the DNAmolecule is a linear array of subunits callednucleotides, or bases, of which there are fourtypes—adenine, cytosine, thymine, and guanine.Genes are discrete stretches of nucleotides thatcarry the information, which is used by the cell tosynthesize proteins.

Human genes take up only about 5 to 10% ofthe DNA. Some of the remaining DNA, whichdoes not code for proteins, may regulate whetheror not proteins are made, but the function of mostof it is unknown.

This landmark of scientific achievementrepresented the completion of the first stage of theproject. Initial results published by both groups inFebruary 2001 declared that the human genomeactually contains only about 30,000 to 40,000genes, much fewer than originally thought. Twotypes of maps were constructed—genetic linkagemaps and physical maps. Genetic linkage mapprovides the relative location of genes and othermarkers on the basis of how frequently genes areinherited together; the closer genes are to eachother on a chromosome, the more likely they areto be inherited together. Physical maps locate genesin relation to the presence of known nucleotidesequences that act as landmarks along the lengthof a chromosome.

One such “marker” used to map the humangenome is a sequence-tagged site (STS)—a shortsequence of nucleotides that occurs only oncethroughout the genome. A relatively detailedphysical map was needed before sequencing could

Table 2.2. The list of organisms whose genome sequence is completed.

Group Organism Genome size Haploid number

Virus MS2 4 kb 1SV40 5 kb 1φX174 5 kb 1M13 6 kb 1λ 50 kb 1

Herpes simplex 152 kb 1T2, T4, T6 165 kb 1Smallpox 267 kb 1

Prokaryotes Methanococcus jannaschii 1600 kb 1E. coli 4600 kb 1

Eukaryotes Borrelia burgdorferi 910 kb 1Saccharomyces cerevisiae 13 Mb 16Caenorhabditis elegans 97 Mb 06Arabidopsis thaliana 100 Mb 05Drosophila melanogaster 180 Mb 04Homo sapiens 3000 Mb 23Zea mays 4500 Mb 10Fugu rubripes 400 Mb 22Amphiuma means 90,000 Mb 14

Page 11: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

28 Bioethics and Biosafety

begin. Sequencing, in which the precise order ofthe nucleotide sequence is determined, was themost technically challenging part of the project.

DNA sequencing of the nematode wormCaenorhabditis elegans and the yeastSaccharomyces cerevisiae was completed in 1996.The DNA sequencing of the other organisms wascompleted in the following order:

(1) E. coli-1997.(2) Fruit fly (Drosophila melanogaster) and

plant Arabidopsis thaliana—2000.(3) The laboratory mouse (Mus musculus) and

bacterium Staphylococcus aureus—2001.The rationale for these efforts is that many

genes with similar functions in disparate organismshave been conserved in evolution and showsurprising similarities. Genes from simplerorganisms can thus be used to study human beings.

Another objective of the Human GenomeProject is to address the ethical, legal, and socialimplications of the information obtained. Societywill derive the greatest benefit from this knowledgeonly if it takes measures to prevent abuses, suchas invasion of the privacy of an individual’s geneticbackground by employers, insurers, or governmentagencies, or discrimination based on geneticgrounds.

The HGP was the natural culmination of thehistory of genetics research. In 1911, AlfredSturtevant, then an undergraduate researcher in thelaboratory of Thomas Hunt Morgan, realized thathe could—and had to, in order to manage hisdata—map the location of the fruit fly (Drosophilamelanogaster) genes, whose mutations Morganlaboratory was tracking over generations.Sturtevant’s very first gene map can be likened tothe Wright brothers’ first flight at Kitty Hawk. Inturn, the Human Genome Project can be comparedto the Apollo program bringing humanity to themoon.

The hereditary material of all multicellularorganisms is the famous double helix ofdeoxyribonucleic acid (DNA), which contains allof our genes. DNA, in turn, is made up of four

chemical bases, pairs of which form the “rungs”of the twisted, ladder-shaped DNA molecules. Allgenes are made up of stretches of these four bases,which are arranged in different ways and indifferent lengths. HGP researchers havedeciphered the human genome in three majorways—determining the order or “sequence”, of allthe bases in our genome’s DNA, making maps thatshow the locations of genes in major sections ofall our chromosomes, and producing what arecalled linkage maps, complex versions of the typeoriginated in early Drosophila research, throughwhich inherited traits (such as those for geneticdisease) can be tracked over generations.

The HGP has revealed that there are probablysomewhere between 30,000 and 40,000 humangenes, and their location can be identified now.This ultimate product of the HGP has given theworld a resource of detailed information about thestructure, organization and function of thecomplete set of human genes. This information canbe thought as the basic set of inheritable“instructions” for the development and functioningof a human being.

The International Human Genome SequencingConsortium published the first draft of the humangenome in the journal ‘Nature’ in February 2001,with the sequence of the entire genome’s threebillion base pairs some 90 percent complete. Astartling finding of this first draft was that thenumber of human genes appeared to besignificantly fewer than previous estimates, whichranged from 50,000 genes to as many as140,000.The full sequence was completed andpublished in April 2003.

How to Sequence

The task of determining the complete sequence ofthe 3,200,000,000 bases of the human genome(30× the size of the nematode genome) wasextremely daunting at the time when the projectwas formally launched. Several lines ofinvestigation focused on an alternative approach

Page 12: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 29

to characterize the human genome. For example,complete genome sequencing may be bypassed byselectively sequencing just expressed sequences,obtained by extracting mRNA from a wide rangeof human tissues. Large scale expressed sequencetags (EST) projects in both the public and privatedomain resulted in the collection of huge amountof sequence information on human genes. Aninternational consortium to map the ESTs in thegenome, using the genetic map as a framework,resulted in the human gene map of 35,000 genes.This was an important and valuable milestone inHGP. However, the sequence of most of themRNAs was incomplete, unknown number ofgenes were missing from the collection, and therewas no information available on gene structures.

In contrast, the experience gained from thestudy of smaller genomes, especially those of thenematode and yeast illustrated the enormouspotential to obtain a complete set of genes, genestructures and all other genetic information bydetermining the complete sequence of the genome.Furthermore, by breaking the task into manageablesegments, and using a physical map to co-ordinatethe work, it was possible to undertake projects tosequence genomes that were far beyond thecapabilities of a simple shotgun approach. For thehuman genome, therefore, the strategy adopted wasto use the landmarks provided by the genetic map,and later the gene map, as a framework to anchora physical map of overlapping clones whichrepresented all human chromosomes. initiallymuch of the work was done using yeast artificialchromosomes (YACs), a yeast cloning system,which accepts vary large fragments and thus allowsa physical map to be built quickly over very largedistances. However, the development of newbacterial cloning systems called BACs or PACs(bacteria or P1 derived artificial chromosomes),which were capable of taking large inserts (up to250 kb) made it possible to make long range mapsdirectly in bacterial clones. This coupled with thegreater convenience and stability of bacterialclones compared to YACs, resulted in the choice

of this system for construction of the physical mapto provide the tile path of clones for sequencing.

Each BACs or PACs has been sequencedusing a random shotgun approach. This approachis essentially the same as was developed for theearly whole genome sequencing projects. DNAfrom the BAC or PAC is broken up randomly intoshort fragments (typically 1-2kb long), which aresub cloned into plasmids or bacteriophage M13cloning vector. The resulting sub clones(transformed bacterial colonies) are picked atrandom, cultured and the sub clone DNA isextracted for use as a sequencing template. Aprimer (short DNA strand) is hybridized to thetemplate within the vector sequence (which iscommon to all clones). This provides a startingpoint for DNA polymerase to synthesize newstrands of DNA by incorporating thedeoxynucleotide triphosphate (dNTPs), which arethe single base precursors of DNA.

Fluorescently labeled analogues for each baseare included in the same reaction (dideoxy NTPsor ddNTPs); a different fluorescent tag is used foreach of the four bases. These analogues extendthe chain in a base-specific manner when they areincorporated (and these are called “chainterminators”). The product of the reaction is aladder of newly synthesized DNA fragments ofincreasing size in single base increments. Eachfragment in the mixture is terminated at a specificplace, which can be identified according to itsspecific fluorescent label that can be separated onthe basis of size by electrophoresis, either throughpolyacrylamide “slab” gels, or more recentlythrough a viscous liquid matrix held in individualcapillaries (capillary gel electrophoresis). Theladder of colored bands thus represents thesequence of the bases in the DNA, and can be readautomatically by an automatic fluorescent detector.

The sequence of all the sub clones of a singleBAC or PAC is analyzed together. All overlap[s]between sequences are identified, and theindividual reads are assembled onto contigs. Aconsensus sequence is obtained at this stage, and

Page 13: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 31

Although the HGP is finished, analyses of thedata will continue for many years. An importantfeature of the HGP project was the federalgovernment’s long-standing dedication to thetransfer of technology to the private sector. Bylicensing technologies to private companies andawarding grants for innovative research, the projectcatalyzed the multibillion-dollar US biotechnologyindustry, and fostered the development of newmedical applications.

Rapid progress in genome science and aglimpse into its potential applications have spurredobservers to predict that biology will be theforemost science of the 21st century. Technologyand resources generated by the Human GenomeProject and other genomics research are alreadyhaving a major impact on research across the lifesciences. The potential for commercialdevelopment of genomics research presents USindustry with a wealth of opportunities, and salesof DNA-based products and technologies in thebiotechnology industry are projected to exceed $45billion by 2009 (Consulting Resources CorporationNewsletter, Spring 1999).

Current and Potential Applications ofGenome Research Include the Following:

• Molecular medicine• Energy sources and environmental applications• Risk assessment• Bioarchaeology, anthropology, evolution, and

human migration• DNA forensics (identification)• Agriculture, livestock breeding and

bioprocessing• Molecular medicine• Improved diagnosis of disease• Earlier detection of genetic predispositions to

disease• Rational drug design• Gene therapy and control systems for drugs• Pharmacogenomics “custom drugs”

Broader applications reaching into many areasof the economy include the following:• Clinical medicine: Many more individualized

diagnostics and prognostics, drugs, and othertherapies.

• Agriculture and livestock: More nutritiousand healthier crops and animals.

• Industrial processes: Cleaner and moreefficient manufacturing in sectors such aschemicals, pulp and paper, textiles, food, fuels,metals, and minerals.

• Environmental biotechnology: Biodegrad-able products, new energy resources, environ-mental diagnostics and less hazardous cleanupof mixed toxic-waste sites.

• DNA fingerprinting: Identification of humansand other animals, plants and microbes;evolutionary and human anthropologicalstudies; and detection of and resistance toharmful agents that might be used in biologicalwarfare.Technology and resources promoted by the

Human Genome Project are beginning to haveprofound impacts on biomedical research, andpromise to revolutionize the wider spectrum ofbiological research and clinical medicine.Increasingly detailed genome maps have aidedresearchers seeking genes associated with dozensof genetic conditions, including myotonicdystrophy, fragile X syndrome, neurofibromatosis,diabetes types 1 and 2, inherited colon cancer,Alzheimer’s disease and familial breast cancer.

On the horizon is a new era of molecularmedicine, characterized less by treating symptomsand more by looking to the most fundamentalcauses of disease. Rapid and more specificdiagnostic tests will make earlier treatment ofcountless maladies possible. Medical researcherswill also be able to devise novel therapeuticregimens based on new classes of drugs,immunotherapy techniques, avoidance ofenvironmental conditions that may trigger disease,and possible augmentation or even replacementof defective genes through gene therapy.

Page 14: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 33

reassembling DNA fragments in their originalorder. This repeated sequencing is known asgenome “depth of coverage”. Draft sequence datais mostly in the form of 10,000 basepair-sizedfragments whose approximate chromosomallocations are known.

In June 2000, the Human Genome Project andCelera Genomics, a privately owned firm foundedin 1998, jointly announced the completion of theinitial sequencing of the human genome, which iscomposed of about three billion nucleotide basepairs.

Developing the Tools and Technologies for theSuccess of HGP

The DOE investments described below helped tomake the Human Genome Project a success.Substantial investments by the NIH and theWellcome Trust in the UK were equally important,however, and should not be overlooked. In mostcases, the DOE achievements outlined below werethe result of basic research programs. Research isan incremental process that learns from both thesuccess and failures of other research investments,including other agencies and organizations.Furthermore, no single instrument, technology,reagent, or protocol made high-throughput DNAsequencing possible, many contributors wereresponsible.

DNA Sequencers

Research on capillary-based DNA sequencingcontributed to the development of two major DNAsequencing machines—the Perkin-Elmer 3700 andthe MegaBace DNA sequencers. The MegaBaceDNA sequencer was developed initially with DOEfunds by Dr. Richard Mathies at UC, Berkeley.The Perkin-Elmer 3700 was based, in part, onDOE-funded research by Dr. Norman Dovichi atthe University of Alberta. These high-throughputinstruments are one of the keys to the success ofthe genome project.

Fluorescent Dyes

DNA sequencing originally used radiolabeledDNA subunits. DOE-funded research contributedto the development of fluorescent dyes, whichincreased the accuracy and safety of DNAsequencing as well as the ability to automate theprocedures.

DNA Cloning Vectors

Before the sequencing of large DNA molecules,they are cut into small pieces and multiplied, orcloned into numerous copies using microbial-based “cloning” vectors. Today, the bacterialartificial chromosome (BAC) is the mostcommonly used vector for initial DNAamplification before sequencing. These cloningvectors were developed with DOE funds.

BAC-End Sequencing

The widely agreed-upon strategy for sequencingthe human genome is based on the use of BACs,which carry fragments of human DNA from knownlocations in the genome. DOE-funded research atthe Institute for Genomic Research in Rockville,Maryland, and at the University of Washingtonprovided the sequencing community with acomplete set of over 450,000 BAC-based genetic“markers” corresponding to a sequence tag every3 to 4 kilobases across the entire human genome.These markers were needed to assemble both thedraft and the final human DNA sequence.

Gene Recognition and Assembly InternetLink (GRAIL)

Gene Recognition and Assembly Internet Link(GRAIL) is one of the most widely used computerprograms for identifying the potential genes inDNA sequence and for general DNA sequenceanalysis. This powerful analytical tool wasdeveloped with DOE funds by Dr. Ed Uberbacherat Oak Ridge National Laboratory. Although a

Page 15: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 35

• Structural genomics: Initiatives are beinglaunched worldwide to generate the 3-Dstructures of one or more proteins from eachprotein family, thus offering clues to functionand biological targets for drug design.

• Experimental methods for understanding thefunction of DNA sequences and the proteinsthey encode include knockout studies toinactivate genes in living organisms, andmonitor any changes that could reveal theirfunctions.

• Comparative genomics: Analyzing DNAsequence patterns of humans and well-studiedmodel organisms side-by-side has becomeone of the most powerful strategies foridentifying human genes and interpreting theirfunction.

2.9. FUTURE OF HGP IN THE MEDICINEAND GENETICS

The medical industry is building upon theknowledge, resources, and technologies emanatingfrom the HGP to further understanding of geneticcontributions to human health. As a result of thisexpansion of genomics into human healthapplications, the field of genomic medicine wasborn. Genetics is playing an increasingly importantrole in the diagnosis, monitoring and treatment ofdiseases.

Diagnosing and Predicting Disease andDisease Susceptibility

All diseases have a genetic component(Fig. 2.6), whether inherited or resulting from thebody’s response to environmental stresses likeviruses or toxins. The success of the HGP has evenenabled researchers to pinpoint errors in genes—the smallest units of heredity—that cause orcontribute to disease.

The ultimate goal is to use this information todevelop new ways to treat, cure, or even preventthe thousands of diseases that afflict humankind.

Fig. 2.6. Diagram of human chromosome19 showing the locations of selected defective

genes and genetic markers.

But the road from gene identification to effectivetreatments is long and fraught with challenges. Inthe meantime, biotechnology companies are racingahead with commercialization by designingdiagnostic tests to detect errant genes in peoplesuspected of having particular diseases or of beingat risk for developing them.

Page 16: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

Human Genome Project and its Ethical Issues 37

analyzing and addressing the ethical, legal andsocial implications of human genetics research atthe same time that the basic scientific issues arebeing studied. In this way, problem areas can beidentified, and solutions developed before thescientific information is integrated into health carepractice.

The ELSI Program is viewed as essential tothe success of the genome project in the UnitedStates, and is supported by federal HGP funds. TheNational Institutes of Health’s ‘National HumanGenome Research Institute (NHGRI)’ hascommitted 5% of its annual research budget tostudy ELSI issues. The US Department of EnergyOffice of Energy Research, NHGRI’s partner inthe US Human Genome Project, also reserves aportion of its funding for ELSI research andeducation.

ELSI and its establishments anticipates andaddresses the implications of mapping andsequencing of the human genomes for theindividuals and society. It also examines the ethical,legal and social consequences of mapping andsequencing the human genome; stimulates publicdiscussion of the issues; and develops policyoptions, which would assure that the informationis used for the benefit of individuals and society.

The Working Group envisioned a program thatwould anticipate potential problems before theyactually occur, and identify possible solutions forthe problems. It suggested a number of means foraccomplishing these goals. Specifically, itencouraged the research community to explore andgather data on a wide range of issues pertinent tothe human genome program that could be used todevelop educational programs, policyrecommendations or possible legislative solutions.A number of areas for focus were identified,including fairness in the use of genetic information,the impact of knowledge of genetic variation onindividuals, and the privacy and confidentiality ofgenetic information, to name a few.

In 1990, in response to the Working Group’sreport, the NHGRI established the ELSI Branch

(later renamed the ELSI Research Program) in itsDivision of Extramural Research, and the DOEestablished an ELSI program in their Office ofEnergy Research. Since the beginning, these twoprograms have collaborated closely, including thejoint support of the ELSI Working Group, thedevelopment of complementary research priorityareas, and the co-funding of ELSI activities ofmutual interest.

SUMMARY

Humans are higher in the hierarchy of livingorganisms because of their independent thinkingand fantasizing capacity. Thus, understanding thehuman genome and its contents gives an idea abouthow simple, single-celled zygote/organismsdeveloped into a complex individual. HumanGenome project is also called Human GenomeInitiative scientific research effort to analyze theDNA of humans and of several lower organisms.The project began in the United States in 1990under the sponsorship of the US Department ofEnergy and the National Institutes of Health.Projects undertaken concurrently in Japan, theUnited Kingdom, Italy, France, and Russia arecoordinated with the American effort through theHuman Genome Organization.

The project’s ultimate goal is to identify thechromosomal location of every human gene, andto determine each gene’s precise chemical structurein order to elucidate its function in health anddisease. The information gathered is expected toserve as the basic reference for research in humanbiology and medicine in the 21st century, and toprovide fundamental insights into the genetic basisof human disease. The new technologies developedin the course of the project will be applicable innumerous other fields of biomedical endeavour.

The total number of genes is estimated to be30,000, which is much lower than previousestimates of 80,000 to 140,000 that had been basedon extrapolations from gene-rich areas as opposedto a composite of gene-rich and gene-poor areas.

Page 17: Human Genome Project and its C H A P T E R …...Human Genome Project and its Ethical Issues 2 C H A P T E R 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort,

38 Bioethics and Biosafety

The functions are unknown for over 50% of thediscovered genes. Less than 2% of the humangenome codes for functional proteins of the totalthree billion base pairs in all cells of the body.Repeated sequences that do not code for proteins(“junk DNA”) make-up at least 50% of the humangenome. Repetitive sequences are thought to haveno direct functions, but they shed light onchromosome structure and dynamics. Over time,these repeats reshape the genome by rearrangingit, creating entirely new genes, and modifying andreshuffling existing genes. Chromosome 1 hasmost of the genes (2968) and the Y chromosomehas the fewest (231).

Genome sequencing is the term used todescribe the laboratory process of reading the orderof the four letter of the genetic alphabets (A,C,G,T)along a strand of DNA. The various steps involvedin such efforts are as follows: Selection of suitablesample materials for the DNA; isolation of DNAfrom cells and preparation of large samples of highquality DNA from these cells; cutting the purifiedDNA at random sites into manageably sized,overlapping pieces of the DNA sample; insertionof these DNA pieces into packages for theproduction of limitless copies of such selectedDNA; recording the order of bases for each DNAsample piece by using DNA sequencingtechniques; determination of the overlap of eachpiece, and assembling the sequences to give thefinal genome of the human.

While following the above approaches, it isnecessary to make appropriate sample populationsbased on the humans distribution. A primary goalof the Human Genome project is to generatedetailed maps of the human genome. These mapswill aid in determining the location of genes withinthe human genome. More specifically, they willassign genes to their chromosomes. Two types ofmaps are being generated. Genetic linkage mapsdetermine the relative arrangement andapproximate distances between genes and markerson the chromosomes; physical maps specify thephysical location (in base pairs) and distance

between genes or DNA fragments with unknownfunctions that are mapped to specific regions ofthe chromosomes.

In the Human Genome Project, importance isalso given to the sequencing of other modelorganisms. DNA sequencing of the nematodeworm Caenorhabditis elegans and the yeastSaccharomyces cerevisiae was completed in 1996,the bacterium Escherichia coli in 1997, the fruitfly (Drosophila melanogaster) and the plantArabidopsis thaliana in 2000, and the laboratorymouse (Mus musculus) and the bacteriumStaphylococcus aureus in 2001. The rationalebehind these findings is that many genes withsimilar functions in disparate organisms have beenconserved in evolution and show surprisingsimilarities. Genes from simpler organisms canthus be used to study their counterparts found inhuman beings.

Another objective of the Human GenomeProject is to address the ethical, legal, and socialimplications of the information obtained. Societywill derive benefit from this knowledge only if ittakes measures to prevent abuses, such as invasionof the privacy of an individual’s geneticbackground by employers, insurers, or governmentagencies, or discrimination based on geneticgrounds.

Large number of advancement made in thediverse fields including molecular biology, geneticengineering and sequencing provided a greatimpetus to the Human Genome Project. Thesetechnological developments dramaticallydecreased the cost of DNA sequencing, whileincreasing its speed and efficiency. For example,it took four years for the international HumanGenome Project to produce the first billion basepairs of sequence, and less than four months toproduce the second billion base pairs. In the monthof January, 2003, the DOE team sequenced 1.5billion bases. The cost of sequencing has droppeddramatically since the project began and is stilldropping rapidly. Other major factors involved incost and time reduction were greatly improved