what is a genome

Upload: surajit-bhattacharjee

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 What is a Genome

    1/23

    WHAT IS A GENOME?

    Life is specified by genomes. Every organism, including humans,has a genome that contains all of the biological information

    needed to build and maintain a living example of that organism.The biological information contained in a genome is encoded initsdeoxyribonucleic acid (DNA)and is divided into discrete unitscalled genes. Genes code for proteins that attach to the genomeat the appropriate positions and switch on a series of reactionscalled gene expression.

    In 1909, Danish botanist Wilhelm Johanssen coined theword genefor the hereditary unit found on a chromosome. Nearly 50years earlier, Gregor Mendel had characterized hereditary units

    asfactors

    observable differences that were passed from parent tooffspring. Today we know that a single gene consists of a uniquesequence of DNA that provides the complete instructions to make afunctional product, called a protein. Genes instruct each cell typesuch as skin, brain, and liverto make discrete sets of proteins at

    just the right times, and it is through this specificity that uniqueorganisms arise.

    The Physical Structure of the Human Genome

    Nuclear DNA

    Inside each of our cells lies a nucleus, a membrane-boundedregion that provides a sanctuary for genetic information. Thenucleus contains long strands of DNA that encode this geneticinformation. A DNAchain is made up of four chemicalbases: adenine(A) andguanine(G), which are called purines,and cytosine(C) andthymine(T), referred to as pyrimidines.Each base has a slightly different composition, or combination ofoxygen, carbon, nitrogen, and hydrogen. In a DNA chain, every

    base is attached to a sugar molecule (deoxyribose) and aphosphate molecule, resulting in a nucleic acid or nucleotide.Individual nucleotides are linked through the phosphate group,and it is the precise order, or sequence, of nucleotides thatdetermines the product made from that gene.

  • 8/13/2019 What is a Genome

    2/23

    Figure 1.The four DNA bases.

    Each DNA base is made up of the sugar 2'-deoxyribose linked to a phosphategroup and one of the four bases depicted above: adenine (top left), cytosine(top right), guanine (bottom left), and thymine (bottom right).

    A DNA chain, also called a strand, has a sense of direction, in whichone end is chemically different than the other. The so-called 5' endterminates in a 5' phosphate group (-PO4); the 3' end terminates in a3' hydroxyl group (-OH). This is important because DNA strands arealways synthesized in the 5' to 3' direction.

    The DNA that constitutes a gene is a double-stranded moleculeconsisting of two chains running in opposite directions. Thechemical nature of the bases in double-stranded DNA creates aslight twisting force that gives DNA its characteristic gently coiledstructure, known as the double helix. The two strands areconnected to each other by chemical pairing of each base on onestrand to a specific partner on the other strand. Adenine (A) pairswith thymine (T), and guanine (G) pairs with cytosine (C).Thus, A-Tand G-C base pairsare said to be complementary.

    This complementary base pairing is what makes DNA a suitablemolecule for carrying our genetic informationone strand of DNA

  • 8/13/2019 What is a Genome

    3/23

    can act as a templateto direct the synthesis of a complementarystrand. In this way, the information in a DNA sequence is readilycopied and passed on to the next generation of cells.

    Organelle DNA

    Not all genetic information is found in nuclear DNA. Both plantsand animals have an organellea "little organ" within the cellcalled the mitochondrion. Each mitochondrion has its own set ofgenes. Plants also have a second organelle, the chloroplast,which also has its own DNA. Cells often have multiplemitochondria, particularly cells requiring lots of energy, such asactive muscle cells. This is because mitochondria are responsiblefor converting the energy stored in macromolecules into a form

    usable by the cell, namely, theadenosine triphosphate (ATP)molecule. Thus, they are often referred to as the powergenerators of the cell.

    Unlike nuclear DNA(the DNA found within the nucleus of a cell),half of which comes from our mother and half from our father,mitochondrial DNA is only inherited from our mother. This isbecause mitochondria are only found in the female gametes or"eggs" of sexually reproducing animals, not in the male gamete,or sperm. Mitochondrial DNA also does not recombine; there isno shuffling of genes from one generation to the other, as there iswith nuclear genes.

    Large numbers of mitochondria are found in the tail of sperm,providing them with an engine that generates the energy needed forswimming toward the egg. However, when the sperm enters the eggduring fertilization, the tail falls off, taking away the father'smitochondria.

    Why Is There a Separate Mitochondrial Genome?

    The energy-conversion process that takes place in themitochondria takes place aerobically, in the presence of oxygen.Other energy conversion processes in the cell takeplace anaerobically, or without oxygen. The independent aerobicfunction of these organelles is thought to have evolved frombacteria that lived inside of other simple organisms in a mutuallybeneficial, or symbiotic, relationship, providing them with aerobiccapacity. Through the process of evolution, these tiny organismsbecame incorporated into the cell, and their genetic systems andcellular functions became integrated to form a single functioning

  • 8/13/2019 What is a Genome

    4/23

    cellular unit. Because mitochondria have their own DNA, RNA,and ribosomes, this scenario is quite possible. This theory is alsosupported by the existence of a eukaryotic organism, called theamoeba, which lacks mitochondria. Therefore, amoeba mustalways have a symbiotic relationship with an aerobic bacterium.

    Why Study Mitochondria?

    There are many diseases caused by mutations in mitochondrialDNA (mtDNA). Because the mitochondria produce energy incells, symptoms of mitochondrial diseases often involvedegeneration or functional failure of tissue. For example, mtDNAmutations have been identified in some forms of diabetes,deafness, and certain inherited heart diseases. In addition,

    mutations in mtDNA are able to accumulate throughout anindividual's lifetime. This is different from mutations in nuclearDNA, which has sophisticated repair mechanisms to limit theaccumulation of mutations. Mitochondrial DNA mutations can alsoconcentrate in the mitochondria of specific tissues. A variety ofdeadly diseases are attributable to a large number ofaccumulated mutations in mitochondria. There is even a theory,the Mitochondrial Theory of Aging, that suggests thataccumulation of mutations in mitochondria contributes to, ordrives, the aging process. These defects are associated withParkinson's and Alzheimer's disease, although it is not knownwhether the defects actually cause or are a direct result of thediseases. However, evidence suggests that the mutationscontribute to the progression of both diseases.

    In addition to the critical cellular energy-related functions,mitochondrial genes are useful to evolutionary biologists becauseof their maternal inheritance and high rate of mutation. Bystudying patterns of mutations, scientists are able to reconstructpatterns of migration and evolution within and between species.For example, mtDNA analysis has been used to trace the

    migration of people from Asia across the Bering Strait to Northand South America. It has also been used to identify an ancientmaternal lineage from which modern man evolved.

    Ribonucleic Acids

    In addition to mRNA, DNA codesfor other forms of RNA, includingribosomal RNAs (rRNAs),

    transfer RNAs (tRNAs), and

  • 8/13/2019 What is a Genome

    5/23

    Just like DNA, ribonucleic acid(RNA)is a chain, or polymer, ofnucleotides with the same 5' to 3'direction of its strands. However,the ribose sugar component of

    RNA is slightly differentchemically than that of DNA. RNAhas a 2' oxygen atom that is notpresent in DNA. Otherfundamental structural differences exist. For example, uracil takesthe place of the thymine nucleotide found in DNA, and RNA is, forthe most part, a single-stranded molecule. DNA directs thesynthesis of a variety of RNA molecules, each with a unique rolein cellular function. For example, all genes that code for proteinsare first made into an RNA strand in the nucleus calleda messenger RNA(mRNA). The mRNA carries the informationencoded in DNA out of the nucleus to the protein assemblymachinery, called theribosome, in the cytoplasm. The ribosomecomplex uses mRNA as a template to synthesize the exactprotein coded for by the gene.

    small nuclear RNAs (snRNAs).rRNAs and tRNAs participate inprotein assembly whereassnRNAs aid in a process calledsplicing the process of editing

    of mRNA before it can be usedas a template for proteinsynthesis.

    Proteins

    Although DNA is the carrier ofgenetic information in a cell, proteinsdo the bulk of the work. Proteins arelong chains containing as many as20 different kinds of amino acids.Each cell contains thousands of

    different proteins: enzymesthat make new molecules andcatalyze nearly all chemical processes in cells; structuralcomponentsthat give cells their shape and help themmove; hormonesthat transmit signals throughout thebody; antibodiesthat recognize foreign molecules;and transport moleculesthat carry oxygen. The genetic code

    carried by DNA is what specifies the order and number of aminoacids and, therefore, the shape and function of the protein.

    "DNA makes RNA, RNA

    makes protein, and proteinsmake us."

    Francis Crick

    The "Central Dogma"a fundamental principle of molecularbiologystates that genetic information flows from DNA to RNAto protein. Ultimately, however, the genetic code resides in DNAbecause only DNA is passed from generation to generation. Yet,in the process of making a protein, the encoded information mustbe faithfully transmitted first to RNA then to protein. Transferringthe code from DNA to RNA is a fairly straightforward processcalledtranscription. Deciphering the code in the resulting mRNAis a little more complex. It first requires that the mRNA leave the

  • 8/13/2019 What is a Genome

    6/23

    nucleus and associate with a large complex of specialized RNAsand proteins that, collectively, are called theribosome. Here themRNA is translated into protein by decoding the mRNA sequencein blocks of three RNA bases, called codons, where each codonspecifies a particular amino acid. In this way, the ribosomal

    complexbuilds a protein one amino acid at a time, with the orderof amino acids determined precisely by the order of the codons inthe mRNA.

    In 1961, Marshall Nirenberg and Heinrich Matthaei correlated thefirst codon (UUU) with the amino acid phenylalanine. After that, itwas not long before the genetic code for all 20 amino acids wasdeciphered.

    A given amino acid can have more than one codon. Theseredundant codons usually differ at the third position. For example,the amino acid serine is encoded by UCU, UCC, UCA, and/orUCG. This redundancy is key to accommodating mutations thatoccur naturally as DNA is replicated and new cells are produced.By allowing some of the random changes in DNA to have noeffect on the ultimate protein sequence, a sort of genetic safetynet is created. Some codons do not code for an amino acid at allbut instruct the ribosome when to stop adding new amino acids.

    Table 1.RNA triplet codons and their corresponding aminoacids.

    U C A GUUUUPhenylalanineUUCPhenylalanine

    UUALeucineUUGLeucine

    UCUSerineUCCSerineUCASerineUCGSerine

    UAUTyrosineUACTyrosineUAAStopUAGStop

    UGUCysteineUGCCysteineUGAStopUGGTryptophan

    CCUULeucineCUCLeucineCUALeucineCUGLeucine

    CCUProlineCCCProlineCCAProlineCCGProline

    CAUHistidineCACHistidineCAAGlutamineCAGGlutamine

    CGUArginineCGCArginineCGAArginineCGGArginine

    AAUUIsoleucineAUCIsoleucineAUAIsoleucineAUGMethionine

    ACUThreonineACCThreonineACAThreonineACGThreonine

    AAUAsparagineAACAsparagineAAALysineAAGLysine

    AGUSerineAGCSerineAGAArginineAGGArginine

    GGUUValineGUCValineGUAValineGUGValine

    GCUAlanineGCCAlanineGCAAlanineGCGAlanine

    GAUAspartateGACAspartateGAAGlutamateGAGGlutamate

    GGUGlycineGGCGlycineGGAGlycineGGGGlycine

    A translation chart of the 64 RNA codons.

    The Core Gene Sequence: Introns and Exons

  • 8/13/2019 What is a Genome

    7/23

    Genes make up about 1 percent of the total DNA in our genome.In the human genome, the coding portions of a gene,called exons, are interrupted by intervening sequences,calledintrons. In addition, a eukaryotic gene does not code for aprotein in one continuous stretch of DNA. Both exons and introns

    are"transcribed"into mRNA, but before it is transported to theribosome, the primary mRNA transcript is edited. This editingprocess removes the introns, joins the exons together, and addsunique features to each end of the transcript to makea "mature"mRNA. One might then ask what the purpose of anintron is if it is spliced out after it is transcribed? It is still unclearwhat all the functions of introns are, but scientists believe thatsome serve as the site for recombination, the process by whichprogeny derive a combination of genes different from that ofeither parent, resulting in novel genes with new combinations ofexons, the key to evolution.

  • 8/13/2019 What is a Genome

    8/23

    Figure 2.Recombination.

    Recombination involves pairing between complementary strands of two

    parental duplex DNAs (topand middle panel). This process creates a stretch ofhybrid DNA (bottom panel) in which the single strand of one duplex is paired

    with its complement from the other duplex.

    Gene Prediction Using Computers

    When the complete mRNA sequence for a gene is known,computer programs are used to align the mRNA sequence withthe appropriate region of the genomic DNA sequence. Thisprovides a reliable indication of the beginning and end of thecoding region for that gene. In the absence of a complete mRNA

    sequence, the boundaries can be estimated by ever-improving,but still inexact, gene prediction software. The problem is the lackof a single sequence pattern that indicates the beginning or endof a eukaryotic gene. Fortunately, the middle of a gene, referredto as the core gene sequence--has enough consistent featuresto allow more reliable predictions.

    From Genes to Proteins: Start to Finish

    We just discussed that the journey from DNA to mRNA to proteinrequires that a cell identify where a gene begins and ends. Thismust be done both during the transcription and the translationprocess.

    Transcription

    Transcription, the synthesis of an RNA copy from a sequence ofDNA, is carried out by an enzyme called RNA polymerase. This

    molecule has the job of recognizing the DNA sequence wheretranscription is initiated, called the promoter site. In general,there are two "promoter" sequences upstream from the beginningof every gene. The location and base sequence of each promotersite vary for prokaryotes(bacteria) and eukaryotes(higherorganisms), but they are both recognized by RNA polymerase,which can then grab hold of the sequence and drive theproduction of an mRNA.

    Eukaryotic cells have three different RNA polymerases, eachrecognizing three classes of genes. RNA polymerase IIis

    responsible for synthesis of mRNAs from protein-coding genes.This polymerase requires a sequence resembling TATAA,

  • 8/13/2019 What is a Genome

    9/23

    commonly referred to as the TATA box, which is found 25-30nucleotides upstream of the beginning of the gene, referred to asthe initiator sequence.

    Transcription terminates when the polymerase stumbles upon a

    termination, or stop signal. In eukaryotes, this process is not fullyunderstood. Prokaryotes, however, tend to have a short regioncomposed of G's and C's that is able to fold in on itself and formcomplementary base pairs, creating a stem in the new mRNA.This stem then causes the polymerase to trip and releasethe nascent, or newly formed, mRNA.

    Translation

    The beginning of translation, the process in which the genetic

    code carried by mRNA directs the synthesis of proteins fromamino acids, differs slightly for prokaryotes and eukaryotes,although both processes always initiate at a codon formethionine. For prokaryotes, the ribosome recognizes andattaches at the sequence AGGAGGU on the mRNA, calledthe Shine-Delgarno sequence, that appears just upstream fromthe methionine (AUG) codon. Curiously, eukaryotes lack thisrecognition sequence and simply initiate translation at the aminoacid methionine, usually coded for by the bases AUG, butsometimes GUG. Translation is terminated for both prokaryotesand eukaryotes when the ribosome reaches one of the three stop

    codons.

    Structural Genes, Junk DNA, and Regulatory Sequences

    Structural Genes

    Sequences that code for proteinsare called structural genes.Although it is true that proteins

    are the major components ofstructural elements in a cell,proteins are also the realworkhorses of the cell. They perform such functions astransporting nutrients into the cell; synthesizing new DNA, RNA,and protein molecules; and transmitting chemical signals fromoutside to inside the cell, as well as throughout the cellbothcritical to the process of making proteins.

    Over 98 percent of the genomeis of unknown function. Althoughoften referred to as "junk"DNA,scientists are beginning touncover the function of many of

    these intergenic sequences

    theDNA found between genes.

    Regulatory Sequences

  • 8/13/2019 What is a Genome

    10/23

    A class of sequences called regulatory sequencesmakes up anumerically insignificant fraction of the genome but providescritical functions. For example, certain sequences indicate thebeginning and end of genes, sites for initiating replication andrecombination, or provide landing sites for proteins that turn

    genes on and off. Like structural genes, regulatory sequences areinherited; however, they are not commonly referred to as genes.

    Other DNA Region s

    Forty to forty-five percent of our genome is made up of shortsequences that are repeated, sometimes hundreds of times.There are numerous forms of this "repetitive DNA", and a fewhave known functions, such as stabilizing the chromosome

    structure or inactivating one of the two X chromosomes indeveloping females, a process called X-inactivation. The mosthighly repeated sequences found so far in mammals arecalled "satellite DNA"because their unusual composition allowsthem to be easily separated from other DNA. These sequencesare associated with chromosome structure and are found atthe centromeres(or centers) and telomeres(ends) ofchromosomes. Although they do not play a role in the coding ofproteins, they do play a significant role in chromosome structure,duplication, and cell division. The highly variable nature of thesesequences makes them an excellent"marker"by whichindividuals can be identified based on their unique pattern of theirsatellite DNA.

    Figure 3.A chromosome.

    A chromosome is composed of a very long molecule of DNA and associatedproteins that carry hereditary information. The centromere, shown at the centerof this chromosome, is a specialized structure that appears during cell division

    and ensures the correct distribution of duplicated chromosomes to daughtercells. Telomeres are the structures that seal the end of a chromosome.

  • 8/13/2019 What is a Genome

    11/23

    Telomeres play a critical role in chromosome replication and maintenance bycounteracting the tendency of the chromosome to otherwise shorten with eachround of replication.

    Another class of non-coding DNA is the "pseudogene", so

    named because it is believed to be a remnant of a real gene thathas suffered mutations and is no longer functional. Pseudogenesmay have arisen through the duplication of a functional gene,followed by inactivation of one of the copies. Comparing thepresence or absence of pseudogenes is one method used byevolutionary geneticists to group species and to determinerelatedness. Thus, these sequences are thought to carry a recordof our evolutionary history.

    How Many Genes Do Humans Have?

    In February 2001, two largely independent draft versions of thehuman genome were published. Both studies estimated that thereare 30,000 to 40,000 genes in the human genome, roughly one-third the number of previous estimates. More recently scientistsestimated that there are less than 30,000 human genes.However, we still have to make guesses at the actual number ofgenes, because not all of the human genome sequence isannotated and not all of the known sequence has been assigned

    a particular position in the genome.

    So, how do scientists estimate the number of genes in agenome? For the most part, they look for tell-tale signs of genesin a DNA sequence. These include: open reading frames,stretches of DNA, usually greater than 100 bases, that are notinterrupted by a stop codonsuch as TAA, TAG or TGA; startcodonssuch as ATG; specific sequences found at spliceunctions, a location in the DNA sequence where RNA removesthe non-coding areas to form a continuous gene transcript fortranslation into a protein; and gene regulatory sequences. Thisprocess is dependent on computer programs that search forthese patterns in various sequence databases and then makepredictions about the existence of a gene.

    From One GeneOne Protein to a More Global Perspective

    Only a small percentage of the 3 billion bases in the humangenome becomes an expressed gene product. However, of theapproximately 1 percent of our genome that is expressed, 40percent is alternatively spliced to produce multiple proteins from a

  • 8/13/2019 What is a Genome

    12/23

    single gene. Alternative splicingrefers to the cutting and pastingof the primary mRNA transcript into various combinations ofmature mRNA. Therefore the one geneone protein theory,originally framed as "one geneone enzyme", does not preciselyhold.

    With so much DNA in the genome, why restrict transcription to atiny portion, and why make that tiny portion work overtime toproduce many alternate transcripts? This process may haveevolved as a way to limit the deleterious effects of mutations.Genetic mutations occur randomly, and the effect of a smallnumber of mutations on a single gene may be minimal. However,an individual having many genes each with small changes couldweaken the individual, and thus the species. On the other hand, ifa single mutation affects several alternate transcripts at once, it ismore likely that the effect will be devastatingthe individual maynot survive to contribute to the next generation. Thus, alternatetranscripts from a single gene could reduce the chances that amutated gene is transmitted.

    Gene Switching: Turning Genes On and Off

    The estimated number of genes for humans, less than 30,000, isnot so different from the 25,300 known genes ofArabidopsisthaliana, commonly called mustard grass. Yet, we appear, at leastat first glance, to be a far more complex organism. A person maywonder how this increased complexity is achieved. One answerlies in the regulatory system that turns genes on and off. Thissystem also precisely controls the amount of a gene product thatis produced and can further modify the product after it is made.This exquisite control requires multiple regulatory input points.One very efficient point occurs at transcription, such that anmRNA is produced only when a gene product is needed. Cellsalso regulate gene expression by post-transcriptionalmodification; by allowing only a subset of the mRNAs to go on to

    translation; or by restricting translation of specific mRNAs to onlywhen the product is needed. At other levels, cells regulate geneexpression through DNA folding, chemical modification of thenucleotide bases, and intricate "feedback mechanisms" in whichsome of the gene's own protein product directs the cell to ceasefurther protein production.

    Controlling Transcription

    Promoters and Regulatory Sequences

  • 8/13/2019 What is a Genome

    13/23

    Transcription is the process whereby RNA is made from DNA. Itis initiated when an enzyme, RNA polymerase, binds to a site onthe DNA called a promoter sequence. In most cases, thepolymerase is aided by a group of proteins called "transcriptionfactors" that perform specialized functions, such as DNA

    sequence recognition and regulation of the polymerase's enzymeactivity. Other regulatory sequencesinclude activators, repressors, and enhancers. Thesesequences can be c is-acting(affecting genes that are adjacentto the sequence) or t rans-acting(affecting expression of thegene from a distant site), even on another chromosome.

    The Globin Genes: An Example of TranscriptionalRegulation

    An example of transcriptional control occurs in the family of genesresponsible for the production of globin. Globinis the protein thatcomplexes with the iron-containing heme molecule to makehemoglobin. Hemoglobintransports oxygen to our tissues via redblood cells. In the adult, red blood cells do not contain DNA formaking new globin; they are ready-made with all of the hemoglobinthey will need.

    During the first few weeks of life, embryonic globin is expressed inthe yolk sac of the egg. By week five of gestation, globin isexpressed in early liver cells. By birth, red blood cells are beingproduced, and globin is expressed in the bone marrow. Yet, theglobin found in the yolk is not produced from the same gene as is

    the globin found in the liver or bone marrow stem cells. In fact, ateach stage of development, different globin genes are turned on andoff through a process of transcriptional regulationcalled "switching".

    To further complicate matters, globin is made from two differentprotein chains: an alpha-like chain coded for on chromosome 16;and a beta-like chain coded for on chromosome 11. Eachchromosome has the embryonic, fetal, and adult form lined up onthe chromosome in a sequential order for developmentalexpression. The developmentally regulated transcription of globin iscontrolled by a number of cis-acting DNA sequences, and although

    there remains a lot to be learned about the interaction of thesesequences, one known control sequence is an enhancer calledthe Locus Control Region (LCR). The LCR sits far upstream on thesequence and controls the alpha genes on chromosome 16. It mayalso interact with other factors to determine which alpha gene isturned on.

    Thalassemiasare a group of diseases characterized by theabsence or decreased production of normal globin, and thushemoglobin, leading to decreased oxygen in the system. There arealpha and beta thalassemias, defined by the defective gene, andthere are variations of each of these, depending on whether theembryonic, fetal, or adult forms are affected and/or expressed.

    Although there is no known cure for the thalassemias, there aremedical treatments that have been developed based on our current

  • 8/13/2019 What is a Genome

    14/23

    understanding of both gene regulation and cell differentiation.Treatments include blood transfusions, iron chelators, and bonemarrow transplants. With continuing research in the areas of generegulation and cell differentiation, new and more effective treatmentsmay soon be on the horizon, such as the advent of gene transfer

    therapies.

    The Inf luence of DNA Structure and Bind ing Domains

    Sequences that are important in regulating transcription do notnecessarily code for transcription factors or other proteins.Transcription can also be regulated by subtle variations in DNAstructure and by chemical changes in the bases to whichtranscription factors bind. As stated previously, the chemical

    properties of the four DNA bases differ slightly, providing eachbase with unique opportunities to chemically react with othermolecules. One chemical modification of DNA,called methylation, involves the addition of a methyl group (-CH3). Methylation frequently occurs at cytosine residues that arepreceded by guanine bases, oftentimes in the vicinity of promotersequences. The methylation status of DNA often correlates withits functional activity, where inactive genes tend to be moreheavily methylated. This is because the methyl group serves toinhibit transcription by attracting a protein that binds specifically tomethylated DNA, thereby interfering with polymerase binding.Methylation also plays an important role ingenomic imprinting,which occurs when both maternal and paternal alleles are presentbut only one allele is expressed while the other remains inactive.Another way to think of genomic imprinting is as "parent oforigin differences"in the expression of inherited traits.Considerable intrigue surrounds the effects of DNA methylation,and many researchers are working to unlock the mystery behindthis concept.

    Controlling Translation

    Translationis the process whereby the genetic code carried byan mRNA directs the synthesis of proteins. Translationalregulationoccurs through the binding of specific molecules,called repressor proteins, to a sequence found on an RNAmolecule. Repressor proteins prevent a gene from beingexpressed. As we have just discussed, the default state for agene is that of being expressed via the recognition of its promoterby RNA polymerase. Close to the promoter region is another cis-

    acting site called the operator, the target for the repressorprotein. When the repressor protein binds to the operator, RNA

  • 8/13/2019 What is a Genome

    15/23

    polymerase is prevented from initiating transcription, and geneexpression is turned off.

    Translational control plays a significant role in the process ofembryonic development and cell differentiation. Upon fertilization,

    an egg cell begins to multiply to produce a ball of cells that are allthe same. At some point, however, these cells beginto differentiate, or change into specific cell types. Some willbecome blood cells or kidney cells, whereas others may becomenerve or brain cells. When all of the cells formed are alike, thesame genes are turned on. However, once differentiation begins,various genes in different cells must become active to meet theneeds of that cell type. In some organisms, the egg houses storeimmature mRNAs that become translationally active only afterfertilization. Fertilization then serves to trigger mechanisms thatinitiate the efficient translation of mRNA into proteins. Similarmechanisms serve to activate mRNAs at other stages ofdevelopment and differentiation, such as when specific proteinproducts are needed.

    Mechanisms of Genetic Variation and Heredity

    Does Everyone Have the Same Genes?

    When you look at the human species, you see evidence of aprocess called genetic variation, that is, there are immediatelyrecognizable differences in human traits, such as hair and eyecolor, skin pigment, and height. Then there are the not so obviousgenetic variations, such as blood type. These expressed,or phenotypic, traits are attributable to genotypicvariation in aperson's DNA sequence. When two individuals display differentphenotypes of the same trait, they are said to have twodifferent allelesfor the same gene. This means that the gene'ssequence is slightly different in the two individuals, and the geneis said to be polymorphic, "poly" meaning many and "morph"meaning shape or form. Therefore, although people generallyhave the same genes, the genes do not have exactly the sameDNA sequence. These polymorphic sites influence geneexpression and also serve as markers for genomic researchefforts.

    Genetic Variation

    The cell cycle is the

  • 8/13/2019 What is a Genome

    16/23

    Most genetic variation occurs during thephases of the cell cycle when DNA isduplicated. Mutations in the new DNAstrand can manifest as base substitutions,such as when a single base gets replaced

    with another; deletions, where one or more bases are left out;orinsertions, where one or more bases are added. Mutations caneither be synonymous, in which the variation still results in acodon for the same amino acid or non-synonymous, in whichthe variation results in a codon for a different amino acid.Mutations can also cause a frame shift, which occurs when thevariation bumps the reference point for reading the genetic codedown a base or two and results in loss of part, or sometimes all,of that gene product. DNA mutations can also be introduced bytoxic chemicals and, particularly in skin cells, exposure toultraviolet radiation.

    process that a cellundergoes toreplicate.

    The manner in which a cell replicates differs with the various classesof life forms, as well as with the end purpose of the cell replication.Cells that compose tissues in multicellular organisms typicallyreplicate by organized duplication and spatial separation of theircellular genetic material, a process called mitosis. Meiosisis themode of cell replication for the formation of sperm and egg cells inplants, animals, and many other multicellular life forms. Meiosisdiffers significantly from mitosis in that the cellular progeny havetheir complement of genetic material reduced to half that of theparent cell.

    Mutations that occur in somatic cellsany cell in the bodyexcept gametes and their precursorswill not be passed on tothe next generation. This does not mean, however, that somaticcell mutations, sometimes called acquired mutations, arebenign. For example, as your skin cells prepare to divide andproduce new skin cells, errors may be inadvertently introducedwhen the DNA is duplicated, resulting in a daughter cell thatcontains the error. Although most defective cells die quickly,some can persist and may even become cancerous if themutation affects the ability to regulate cell growth.

    Mutations and the Next Generation

    There are two places where mutations can be introduced andcarried into the next generation. In the first stages ofdevelopment, a sperm cell and egg cell fuse. They then begin todivide, giving rise to cells that differentiate into tissue-specific celltypes. One early type of differentiated cell is the germ line cell,

  • 8/13/2019 What is a Genome

    17/23

    which may ultimately develop into mature gametes. If a mutationoccurs in the developing germ line cell, it may persist until thatindividual reaches reproductive age. Now the mutation has thepotential to be passed on to the next generation.

    Mutations may also be introduced during meiosis, the mode ofcell replication for the formation of sperm and egg cells. In thiscase, the germ line cell is healthy, and the mutation is introducedduring the actual process of gamete replication. Once again, thesperm or egg will contain the mutation, and during thereproductive process, this mutation may then be passed on to theoffspring.

    One should bear in mind that not all mutations are bad. Mutationsalso provide a species with the opportunity to adapt to newenvironments, as well as to protect a species from newpathogens. Mutations are what lie behind the popular saying of"survival of the fittest", the basic theory of evolution proposedby Charles Darwin in 1859. This theory proposes that as newenvironments arise, individuals carrying certain mutations thatenable an evolutionary advantage will survive to pass thismutation on to its offspring. It does not suggest that a mutation isderived from the environment, but that survival in thatenvironment is enhanced by a particular mutation. Some genes,and even some organisms, have evolved to tolerate mutationsbetter than others. For example, some viral genes are known to

    have high mutation rates. Mutations serve the virus well byenabling adaptive traits, such as changes in the outer protein coatso that it can escape detection and thereby destruction by thehost's immune system. Viruses also produce certain enzymesthat are necessary for infection of a host cell. A mutation withinsuch an enzyme may result in a new form that still allows thevirus to infect its host but that is no longer blocked by an anti-viraldrug. This will allow the virus to propagate freely in itsenvironment.

    Mendel's LawsHow We Inherit Our Genes

    In 1866, Gregor Mendel studied the transmission of sevendifferent pea traits by carefully test-crossing many distinctvarieties of peas. Studying garden peas might seem trivial tothose of us who live in a modern world of cloned sheep and genetransfer, but Mendel's simple approach led to fundamentalinsights into genetic inheritance, known today as Mendel's Laws.Mendel did not actually know or understand the cellularmechanisms that produced the results he observed. Nonetheless,

    he correctly surmised the behavior of traits and the mathematicalpredictions of their transmission, the independent segregation of

  • 8/13/2019 What is a Genome

    18/23

    alleles during gamete production, and the independentassortment of genes. Perhaps as amazing as Mendel'sdiscoveries was the fact that his work was largely ignored by thescientific community for over 30 years!

    Mendel's Principles of Genetic InheritanceLaw of Segregation:Each of the two inherited factors (alleles)

    possessed by the parent will segregate and pass into separategametes (eggs or sperm) during meiosis, which will each carry onlyone of the factors.

    Law of Independent Assortment:In the gametes, alleles of onegene separate independently of those of another gene, and thus allpossible combinations of alleles are equally probable.

    Law of Dominance:Each trait is determined by two factors(alleles), inherited one from each parent. These factors each exhibita characteristic dominant, co-dominant, or recessive expression,and those that are dominant will mask the expression of those thatare recessive.

    How Does Inheritance Work?

    Our discussion here is restricted to sexually reproducing

    organisms where each gene in an individual is represented bytwo copies, called allelesone on each chromosome pair. Theremay be more than two alleles, or variants, for a given gene in apopulation, but only two alleles can be found in an individual.Therefore, the probability that a particular allele will be inherited is50:50, that is, alleles randomly and independently segregate intodaughter cells, although there are some exceptions to this rule.

    The term diploiddescribes a state in which a cell has two sets ofhomologous chromosomes, or two chromosomes that are thesame. The maturation of germ line stem cells into gametes

    requires the diploid number of each chromosome be reduced byhalf. Hence, gametes are said to be haploidhaving only asingle set of homologous chromosomes. This reduction isaccomplished through a process called meiosis, where onechromosome in a diploid pair is sent to each daughter gamete.Human gametes, therefore, contain 23 chromosomes, half thenumber of somatic cellsall the other cells of the body.

    Because the chromosome in one pair separates independently ofall other chromosomes, each new gamete has the potential for atotally new combination of chromosomes. In humans, the

    independent segregation of the 23 chromosomes can lead to asmany as 16 to 17 million different combinations in one individual's

  • 8/13/2019 What is a Genome

    19/23

    gametes. Only one of these gametes will combine with one of thenearly 17 million possible combinations from the other parent,generating a staggering potential for individual variation. Yet, thisis just the beginning. Even more variation is possible when youconsider the recombination between sections of chromosomes

    during meiosis as well as the random mutation that can occurduring DNA replication. With such a range of possibilities, it isamazing that siblings look so much alike!

    Expression of Inherited Genes

    Gene expression, as reflected in an organism's phenotype, isbased on conditions specific for each copy of a gene. As we justdiscussed, for every human gene there are two copies, and for

    every gene there can be several variants or alleles. If both allelesare the same, the gene is said to be homozygous. If the allelesare different, they are said to be heterozygous. For some alleles,their influence on phenotype takes precedence over all otheralleles. For others, expression depends on whether the geneappears in the homozygous or heterozygous state. Still otherphenotypic traits are a combination of several alleles from severaldifferent genes. Determining the allelic condition used to beaccomplished solely through the analysis of pedigrees, much theway Mendel carried out his experiments on peas. However, thismethod can leave many questions unanswered, particularly fortraits that are a result of the interaction between several differentgenes. Today, molecular genetic techniques exist that can assistresearchers in tracking the transmission of traits by pinpointingthe location of individual genes, identifying allelic variants, andidentifying those traits that are caused by multiple genes.

    The Nature of AllelesA dominant alleleis an allele that is almost always expressed, even if onlyone copy is present. Dominant alleles express their phenotype even when

    paired with a different allele, that is, when heterozygous. In this case, thephenotype appears the same in both the heterozygous and homozygousstates. Just how the dominant allele overshadows the other allele dependson the gene, but in some cases the dominant gene produces a gene productthat the other allele does not. Well-known dominant alleles occur in thehuman genes for Huntington disease, a form of dwarfism calledachondroplasia, and polydactylism (extra fingers and toes).

    On the other hand, a recessive allelewill be expressed only if there are twoidentical copies of that allele, or for a male, if one copy is present on the Xchromosome. The phenotype of a recessive allele is only seen when bothalleles are the same. When an individual has one dominant allele and onerecessive allele, the trait is not expressed because it is overshadowed by

    the dominant allele. The individual is said to be a carrier for that trait.

  • 8/13/2019 What is a Genome

    20/23

    Examples of recessive disorders in humans include sickle cell anemia, Tay-Sachs disease, and phenylketonuria (PKU).

    A particularly important category of genetic linkage has to do with the X andY sex chromosomes. These chromosomes not only carry the genes that

    determine male and female traits, but also those for some othercharacteristics as well. Genes that are carried by either sex chromosomeare said to be sex linked. Men normally have an X and a Y combination ofsex chromosomes, whereas women have two X's. Because only men inheritY chromosomes, they are the only ones to inheritY-linked traits. Both menand women can have X-linked traitsbecause both inherit X chromosomes.

    X-linked traits not related to feminine body characteristics are primarilyexpressed in the phenotype of men. This is because men have only one Xchromosome. Subsequently, genes on that chromosome that do not codefor gender are expressed in the male phenotype, even if they are recessive.In women, a recessive allele on one X chromosome is often masked in theirphenotype by a dominant normal allele on the other. This explains why

    women are frequently carriers of X-linked traits but more rarely have themexpressed in their own phenotypes. In humans, at least 320 genes are X-linked. These include the genes for hemophilia, redgreen colorblindness, and congenital night blindness. There are at least a dozen Y-linked genes, in addition to those that code for masculine physical traits.

    It is now known that one of the X chromosomes in the cells of humanfemales is completely, or mostly, inactivated early in embryonic life. This isa normal self-preservation action to prevent a potentially harmful doubledose of genes. Recent research points to the "Xist" geneon the Xchromosome as being responsible for a sequence of events that silencesone of the X chromosomes in women. The inactivated X chromosomes

    become highly compacted structures known as Barr bodies. Thepresence of Barr bodies has been used at international sport competitionsas a test to determine whether an athlete is a male or a female.

    Exceptions to Mendel's Laws

    There are many examples of inheritance that appear to beexceptions to Mendel's laws. Usually, they turn out to representcomplex interactions among various allelic conditions. For

    example,co-dominant allelesboth contribute to a phenotype.Neither is dominant over the other. Control of the human bloodgroup system provides a good example of co-dominant alleles.

    The Four Basic Blood TypesThere are four basic blood types, and they are O, A, B, and

    AB. We know that our blood type is determined by the"alleles" that we inherit from our parents. For the blood type

    gene, there are three basic blood type alleles: A, B, and O.We all have two alleles, one inherited from each parent. The

  • 8/13/2019 What is a Genome

    21/23

    possible combinations of the three alleles are OO, AO, BO,AB, AA, and BB. Blood types A and B are "co-dominant"alleles, whereas O is "recessive". A codominant allele isapparent even if only one is present; a recessive allele isapparent only if two recessive alleles are present. Because

    blood type O is recessive, it is not apparent if the personinherits an A or B allele along with it. So, the possible allelecombinations result in a particular blood type in this way:

    OO = blood type OAO = blood type ABO = blood type BAB = blood type ABAA = blood type ABB = blood type BYou can see that a person with blood type B may have a B

    and an O allele, or they may have two B alleles. If bothparents are blood type B and both have a B and a recessiveO, then their children will either be BB, BO, or OO. If thechild is BB or BO, they have blood type B. If the child is OO,he or she will have blood type O.

    Pleiotropism, or pleotrophy, refers to the phenomenon in whicha single gene is responsible for producing multiple, distinct, andapparently unrelated phenotypic traits, that is, an individual can

    exhibit many different phenotypic outcomes. This is because thegene product is active in many places in the body. An exampleisMarfan's syndrome, where there is a defect in the gene codingfor a connective tissue protein. Individuals with Marfan'ssyndrome exhibit abnormalities in their eyes, skeletal system, andcardiovascular system.

    Some genes mask the expression of other genes just as a fullydominant allele masks the expression of its recessive counterpart.A gene that masks the phenotypic effect of another gene is calledanepistatic gene; the gene it subordinates is the hypostatic

    gene. The gene for albinismin humans is an epistatic gene. It isnot part of the interacting skin-color genes. Rather, its dominantallele is necessary for the development of any skin pigment, andits recessive homozygous state results in the albino condition,regardless of how many other pigment genes may be present.Because of the effects of an epistatic gene, some individuals whoinherit the dominant, disease-causing gene show only partialsymptoms of the disease. Some, in fact, may show no expressionof the disease-causing gene, a condition referred toasnonpenetrance. The individual in whom such a nonpenetrantmutant gene exists will be phenotypically normal but still capableof passing the deleterious gene on to offspring, who may exhibit

  • 8/13/2019 What is a Genome

    22/23

    the full-blown disease.

    Then we have traits that are multigenic, that is, they result fromthe expression of several different genes. This is true for humaneye color, in which at least three different genes are responsible

    for determining eye color. A brown/blue gene and a central browngene are both found on chromosome 15, whereas a green/bluegene is found on chromosome 19. The interaction between thesegenes is not well understood. It is speculated that there may beother genes that control other factors, such as the amount ofpigment deposited in the iris. This multigenic system explains whytwo blue-eyed individuals can have a brown-eyed child.

    Speaking of eye color, have you ever seen someone with onegreen eye and one brown eye? In this case,somaticmosaicismmay be the culprit. This is probably easier to describethan explain. In multicellular organisms, every cell in the adult isultimately derived from the single-cell fertilized egg. Therefore,every cell in the adult normally carries the same geneticinformation. However, what would happen if a mutation occurredin only one cell at the two-cell stage of development? Then theadult would be composed of two types of cells: cells with themutation and cells without. If a mutation affecting melaninproduction occurred in one of the cells in the cell lineage of oneeye but not the other, then the eyes would have different geneticpotential for melanin synthesis. This could produce eyes of two

    different colors.

    Penetrancerefers to the degree to which a particular allele isexpressed in a population phenotype. If every individual carryinga dominant mutant gene demonstrates the mutant phenotype, thegene is said to show complete penetrance.

    Molecular Genetics: The Study of Heredity, Genes,and DNA

    As we have just learned, DNA provides a blueprint that directs allcellular activities and specifies the developmental plan ofmulticellular organisms. Therefore, an understanding of DNA,gene structure, and function is fundamental for an appreciation ofthe molecular biology of the cell. Yet, it is important to recognizethat progress in any scientific field depends on the availability ofexperimental tools that allow researchers to make new scientificobservations and conduct novel experiments. Thelast section ofthe genetic primerconcludes with a discussion of some of thelaboratory tools and technologies that allow researchers to study

    cells and their DNA.

    http://www.ncbi.nlm.nih.gov/About/primer/genetics_molecular.htmlhttp://www.ncbi.nlm.nih.gov/About/primer/genetics_molecular.htmlhttp://www.ncbi.nlm.nih.gov/About/primer/genetics_molecular.htmlhttp://www.ncbi.nlm.nih.gov/About/primer/genetics_molecular.htmlhttp://www.ncbi.nlm.nih.gov/About/primer/genetics_molecular.htmlhttp://www.ncbi.nlm.nih.gov/About/primer/genetics_molecular.html
  • 8/13/2019 What is a Genome

    23/23