jason - dna barcodes and watermarks

Upload: impellotyrannis

Post on 30-May-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    1/59

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    2/59

    REPORT DOCUMENTATION PAGE Form ApprovedOMB No. 0704-0188Public reporting burden for this collection of information estimaed to average 1hour per response, including the time for review instructions, searching existing data sources, gatheringand maintaining the data needed, and completing and reviewing the cdlection of information. Send comments regarding this burden estimate or any other aspect of this collection ofinformation, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway,Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget. Paperwork Reduction Project (0704-0188), Washington, DC 20503.1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED

    June 20044. TITLE AND SUBTITLE

    5. FUNDING NUMBERSDNA Barcodes and Watermarks

    6. AUTHOR(S) 13049022-DCSteven Block, et al.

    7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBERThe MITRE CorporationJASON Program Office JSR-03-3057515 Colshire DriveMcLean, Virginia 22102

    9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORINGAGENCY REPORT NUMBERDepartment ofEnergyOffice of Science JSR-03-305Washington, DC 20585

    11. SUPPLEMENTARY NOTES

    12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE

    Approved for public release; distribution unlimited. Distribution Statement A13. ABSTRACT (Maximum 200 words)

    This study was conducted on behalf of the DOE during the summer of 2003. The JASON Studyexplored the feasibility of a program to tag genetically the microorganisms used for bioremediation,for the purpose of identification.

    14. SUBJECTTERMS

    17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATIONOF REPORT OFTHIS PAGEUNCLASSIFIED UNCLASSIFIED

    19. SECURITY CLASSIFICATIONOF ABSTRACTUNCLASSIFIED

    15. NUMBER OF PAGES

    16. PRICE CODE

    20. LIMITATION OF ABSTRACTSAR

    Standard Form 298 (Rev. 289)Prescribed by ANSI Std. 239-18298-102

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    3/59

    Contents1 EXECUTIVE SUMMARY ...............................................................................................1The Bioremediation Problem ............................................................................................... 1

    The Barcode Concept. .......................................................................................................... 1Barcode and Watermark Implementation ............................................................................ lRecommendations ................................................................................................................ 22 JASON STUDY (JSR-03-305) DNA BARCODES & WATERMARKS ......................5Briefers We Heard ............................................................................................................... 5Introduction: Statement of the Problem .............................................................................. 7Multiple Types ofBioremediation ...................................................................................... 8The DNA Barcode Concept ...............................................................................................11What a DNA Barcode is NOT ........................................................................................... 12What a DNA Barcode IS ................................................................................................... 13What is a DNA Watermark? ..............................................................................................14The BarcodelWatermark Tagging System .........................................................................15Introduction to the Bacterial Chromosome ....................................................................... .15Where to Place Barcodes & Watermarks .......................................................................... 16Anatomy of a Barcode .......................................................................................................18A Practical Barcode is Shown ........................................................................................... 20How Many Practical Barcodes Exist? ............................................................................... 20

    Constructing a DNA Watermark ....................................................................................... 22Introducing Barcodes & Watermarks ................................................................................23Placing Barcodes & Watermarks by Recommendation .................................................... .24Placing Barcodes by Group II intron "Retro-homing" ...................................................... 25Reading Barcodes & Watermarks ...................................................................................... 27A Worked Example:Barcoding & Watermarking the Model Organism, D. radiodurans ............................ .29Selecting the Barcode Insertion Site .................................................................................. 30Choosing the Watermark Sites ..........................................................................................31An Alternative Barcode Scheme ........................................................................................ 32A JASON Idea: Fast, Adjustable Molecular Clocks .........................................................33A Clock Based on Tandem Repeats ................................................................................... 34A Better Clock Based on a Directed, Error-Prone Polymerase .........................................35Barcode and Watermark Considerations .......................................................................... .36Barcodes and Bioremediation ............................................................................................ 37JASON Recommendations .............................................................................................. 409

    3 APPENDICESAppendix A: Construction and Analysis ofDNA Barcode Libraries ............................ .43Appendix B: Placing Barcodes and Watermarks byRecombination within the Genome .............................................................................. 50

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    4/59

    1. EXECUTIVE SUMMARY

    The Bioremediation Problem

    It is estimated that 60% of DOE facilities contain groundwater contaminated by heavymetals, radionuclides, or chlorinated hydrocarbons. Moreover, 50% of the topsoil or sediment atDOE facilities is now contaminated. The DOE faces a massive remediation problem, with a needto treat 1.7 trillion gallons of polluted water and 40 million cubic meters of contaminated soil. Toassist in this considerable effort, a "genomic approach to waste cleanup" is currently beingexplored, harnessing the power ofmicrobial biochemistry (in communities of living bacteria andfungi) to degrade complex organic molecules, and to reduce or sequester certain chemicals,particularly heavy metals. Bioremediation can take on several forms, ranging from naturaldegradation (' intrinsic bioremediation'), to encouraging the growth of endogenous organisms insitu ('enhanced bioremediation'), to the introduction ofnon-native microbial species('bioaugmentation'), to the application of sophisticated bioengineering to generate novel strainsoptimized for the specific remediation task at hand. Irrespective of he origin of he microbesused, however, it will be vital to establish a socially and legally acceptable means offollowingthe growth and ecology ofall species used or bioremediation.The Barcode Concept

    This JASON Study explored the feasibility of a program to tag genetically themicroorganisms used for bioremediation, for the purpose of identification. Such DNA-based tagswould be fully heritable, but carefully designed to convey no phenotype to the organisms beinglabeled ("genotype without phenotype"). Tags would be structured so that they could bespecifically amplified by PCR (using a universal set ofprimers) and rapidly read in the field or inthe laboratory. The system we contemplate would support the sensitive, multiplexed readout ofmixed populations of strains, such as those typically found in complex microbial communities.Moreover, DNA tags would be designed to be robust against most mutations and certain kinds ofintentional interference. Properly implemented, these tags would constitute an effective'barcode' system for tracking microorganisms in the wild. DNA barcode tracking could be usedquantitatively to monitor microbial growth, dispersal, transport, blooms, die-outs, ecologicalniches, and more. All DNA barcode labels would be registered in a public database, and theywould be designed and introduced following uniform standards established by the DOE. Barcodeintegrity could be further protected against tampering by a system ofDNA 'watermarks'- acovert set of minor, distributed genomic changes - whose implementation details would remainproprietary.Barcode and Watermark Implementation

    A useful barcode system could be developed quite economically, adapting many of themethods currently available in genetics and molecular biology (such as PCR and site-specificrecombination), along with straightforward adaptations of existing or planned instrumentation

    1

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    5/59

    from the biotechnology sector (such as hybridization arrays). We envision that such a systemwould be composed of several distinct elements, as follows:1) A public database designed for recording and registering barcodes and all associatedinformation (organism, strain and variant, full genotype, release date, etc.)2) A private database designed for recording and registering watermarks against theirassociated barcodes3) A design strategy to generate thousands of unique, robust DNA sequences appropriate foruse as barcodes and watermarks in microbial organisms, and the 'universal' peR primersintended for these4) Bioinformatic methods to identify suitable and appropriate target locations in microbialgenomes that could receive DNA barcodes and watermarks5) A practical means of inserting both barcodes and watermarks into microbial genomes

    using site-specific recombination approaches6) Sensitive, efficient, multiplexed ways to amplify and read out both barcodes and

    watermarks in a (potentially) mixed popUlation ofmicrobial organismsItems (3), (4), (5) and (6), in particular, warrant additional consideration, and are covered indetail in the full report. There, we present some practical approaches for realizing each of theseelements using existing biotechnologies, and discuss the use ofvarious alternative methods. Wealso supply a worked example ofhow to barcode and watermark the genome of a microorganismthat has been fully sequenced, Deinococcus radiodurans (a bacterial strain of interest to the DOEbecause it can tolerate high levels of radiation), which has recently been engineered to reducemetabolize mercury, as well as to digest toxic compounds such as toluene and chlorobenzene.Finally, we suggest ways that barcodes and watermarks might be enhanced, in principle, insecond-generation designs, including the incorporation ofa fast mutational 'clock' into a specialportion of the bacterial genome that would allow the number of generations since theenvironmental release of tagged organism to be estimated.Recommendations

    JASON recognizes that introducing any heritable DNA label into a microbe - even onethat does not impart a phenotype and has no impact on its natural fitness - technicallyconstitutes a 'genetic modification.' Despite the popular stigma now being attached togenetically-modified organisms (GMOs), particularly those intended for food, we believe thenumerous advantages conferred by barcode labels in monitoring the spread ofmicrobes outweighsuch concerns, provided that barcodes perform benignly, as designed. However, the deploymentwould have to be preceded by an educational outreach program backed up by valid scientificdata from actual tests, before the general public will likely accept barcoding as a means oftracking microorganisms in the environment. That said, significant levels of support for abarcoding program may develop from members of the scientific community, especiallybiologists whose voices are now strong in support of environmental concerns, assuming that thesafety and efficacy ofmicrobial barcodes can be demonstrated in pilot studies.

    We conclude that a program for barco ding the microorganisms used in bioremediation isnot only feasible, but advisable. The Report offers a set of specific recommendations for a

    2

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    6/59

    comprehensive program aimed at developing and testing barcodes and watermarks. In brief, weurge the DOE to consider the following: Look into ways of adapting existing biotechnological instrumentation for use in keyaspects of barcode synthesis, insertion, and readout. Develop specific hybridizationarrays (DNA chips) for conventional barcode readout and for barcode readoutaccompanied by watermark readout. Sponsor applied research into adapting site-specific recombination methods to insertbarcodes and watermarks into bacterial genomes, including approaches based on(1) homologous recombination, (2) 'retrohoming' insertion by Group II intron vectors,and (3) excision-deficient bacteriophages. Sponsor basic research aimed at identifyingalternative and improved methods. Establish a working group to formulate uniform standards for microbial barcoding, whichwould standardize the database structures, as well as specify rules for the coding, design

    and disbursement of the barcodes and watermarks used. Perform feasibility studies to measure the actual stabilityof barcodes and watermarksintroduced by any of the various methods, their effect (i f any) on natural fitness, theirmutation and loss rates, etc. Develop a barcoding and watermarking 'testbed' program using one or a few modelorganisms, which would be barcoded, watermarked, grown up, and monitored in a trial

    release program (performance monitoring in a microcosm study). If the above developments prove successful, implement DNA barcoding andwatermarking standards for performance monitoringof all bacterial and fungal strainsused in DOE programs, and promote the global use ofbarcoded organisms inbioremediation.

    We feel that this is a leadership opportunity for the DOE. Success in this arena mayencourage the explorationofbarcodes for other purposes, such as tagging the microorganisms inthe American Type Culture Collection (ATCC), tagging the bioengineered organisms proposedfor commercial bioremediation purposes (e.g. digesting oil slicks), and tagging bacterial andfungal pathogens on the CDC List of Select Agents.

    3

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    7/59

    2. JASON Study JSR-03-305 DNA Barcodes & WatermarksThis study was conducted on behalfof the DOE during the summer of2003. The names

    of and institutional affiliations of the eight JASON Members participating in the study areshown. Their fields of academic expertise include biology, medicine, physics, mathematics,statistics, and computer science.

    JASON 2003 JSR-03-305. ~ t . , 1 II III II 11:111

    . J A S O ~ S t u d Y I on I ill II II 1'1t:OINA Barcodes & Wa'tJermarks :1111 I I III 1111111:A program to tag microorganisms genetically for th e purpose of identification

    Study Participants:Steven Block (Stanford U., Study Leader)David Donoho (Stanford U.)Terry Hwa (UCSD)Gerald Joyce (Scripps Inst.)David Nelson (Harvard U.)

    Tim Stearns (Stanford U.)Peter Weinberger (Google Inc.)Ellen Williams (U. Maryland)

    Uodasslfied

    Briefers We Heard:Three briefers were invited to discuss their work relevant to this study. Each briefermade a unique and valuable contribution to this study, and JASON is indebted to them for their

    input.Dr. Paul Jackson is a Laboratory Fellow and Technical Staffmember of the BiosciencesDivision at Los Alamos National Laboratory. His work is on developing a basic understandingofbiological threat pathogens, and he has been responsible for developing "the most maturemethods and associated reagents developed in the past several years for strain and speciesidentification.. .provided to the CDC and FBI and .. .currently in use.,,1 Dr. Jackson has helped topioneer the use of intrinsic genetic information, including natural sequence variations, tandemDNA repeats, and single nucleotide polymorphisms, in tracking and identifYing pathogenspecies.

    1 http://t8web.lanl.gov/people/rajan/CT2002/BIO/jackson.html

    5

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    8/59

    Dr. Andrew Ellington is the Wilson & Kathryn Fraser Research Professor in Biochemistry atthe University ofTexas, Austin. His lab works on the evolutionary engineering of molecules,metabolism, and organisms, including the design and selection of catalytically-active RNAmolecules. Dr. Ellington has developed catalytic RNAs derived from the Group II Intron ofLactococcus lactis to target the introduction of sequences at specific, pre-defined sites inbacterial DNA. This technology holds particular promise as a way of introducing geneticsequence tags into bacterial genomes.Dr. Ronald Davis is a Professor of Biochemistry and Genetics and the Director ofthe StanfordGenome Technology Center. His group is using yeast (Saccharomyces cerevisiae) to conductwhole genome analysis projects. Dr. Davis' group has made a nearly complete set of haploidand diploid strains (21,000 in all), each containing a deletion of one of the ~ 6 , 0 0 0 yeast genes.To facilitate whole genome analysis, each deletion is molecularly tagged with a unique 20-merDNA sequence. This sequence acts as a molecular barcode and makes it easy to identify thepresence of each deletion. Dr. Davis ' pioneering work provides an existence proof that geneticbarcodes can be made to work, and these have already shown their considerable utility inresearch context.

    ,JASON 2003' I

    I I ,

    Briefers Heard..............

    - Paul Jackson (Los Alamos National Lab)-Andrew Ellington (University of Texas)- Ronald Davis (Stanford University)

    Ron Davis and colleagues implemented a prototypical barcodfng schemeto help keep track of deletion mutants created for every one of the-6,000 Saccharomyces cerevlsiae genes (Yeast Genome Project).DNA BatCOdes &Wafennalf(s Undassified

    6

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    9/59

    INTRODUCTION

    Statement of the Problem" I ,

    J A S O ~ 2003 I ,I 1, 1

    We have a problem

    "It is estimated that more than 60% of DOE facilities have ground-water contaminated with metals or radionuclides. The only con-taminant that appears more often than metal and radionuclidecontaminants In groundwater Is chlorinated hydrocarbons. IWJrethan 50% of all soil and sediments at DOE facilities are contamina-tedM' DOE is currently responsible for remedlating 1.7 trillion gal-lons of contaminated groundwater, an amount equal to approxi-mately four times the dally U.S. water consumption, and 40 mil-lion cubic meters of contaminated soil, enough to fill approximate-ly 17 professional sports stadiums."

    DNA 8111'COdes&Wa/etma/f(s

    NABIR Strategic Plan (2001)LBNL-49054UnclassiHed :3

    The Department ofEnergy "has a 50-year legacy of environmental problems resultingfrom the production ofnuclear weapons,,2 and faces a massive cleanup problem on the landscontained within its facilities. Primary contaminants are found in both the soil and groundwaterat most DOE sites, and include halogenated hydrocarbons, acids, chelating agents, radionuclides,and heavy metals. It is estimated that 60% ofDOE facilities contain contaminated groundwater,and that 50% of the topsoil or sediment is presently contaminated. The scope o/the problem istruly immense, with a need to treat 1.7 trillion gallons ofpolluted water and 40 million cubicmeters of contaminated soil. According to NABIR (Natural and Accelerated BioremediationResearch Program), a DOE-funded interdisciplinary effort to explore the scientific basis forstrategies to reduce the risk of contaminants to humans and to the environment,

    "[M]any ofthe contaminated soils, sediments, and groundwater are believed to beimpossible to remediate with existing technology. Examples of such intractableproblems include the Snake River Aquifer in Idaho, contaminated groundwater atthe' 100' and '200' areas at Hanford, Washington, contaminated sediments in theColumbia River, and groundwater at the Nevada Test Site (DOE, 1995). The hugecost, long duration, and technical challenges associated with remediating DOEfacilities present a significant opportunity for science to contribute cost-effectivesolutions."

    2 Natural and Accelerated Bioremediation Research (NABIR) Program Mission statement, available athttp://www.er.doe.gov/production/ober/nabir/mission.html.

    7

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    10/59

    To assist in this considerable effort, a "genomic approach to waste cleanup" is currentlybeing explored, harnessing the considerable potential ofmicrobial biochemistry-incommunities ofliving bacteria or fungi-to degrade complex organic molecules, and to reduceor sequester chemicals, particularly heavy metals, thereby rendering the cleanup process moretractable. This, in essence, defines the process ofbioremediation. 'Bioremediation' can take onmany definitions, so it is worthwhile examining some of these at the outset. I i

    JAsoN 2063' 1I I

    DOE's "Genomic Approaches to Waste Cleanup" UNew environmental-restoration andwaste-treatment solutions basedonbiotechnology will minimize threats tohuman health andoffer opportunities forlong-term stewardship ofDOE lands. n

    DNA B8ItOdes &WIlIennaI*sMultiple Types ofBiorernediation

    Genomes to Life Programhttpjlldoegenomestollfe.onzlbenefltslcleanup.html

    Unclassified 4

    In its simplest form, known as 'intrinsic bioremediation,' populations ofmicroorganismsnaturally present in the water or soil adjust adaptively to accommodate any new stress in thechemical environment. While some indigenous microbial species experience toxic effects anddie off, others may be able to better cope with the changing chemistry, and possibly even derivemetabolic benefit from it. Those species that-individually or in combination-successful lymetabolize the pollutant will prosper. Over time, the resistant species will tend to occupy alarger niche in the environment, until such time as their food sources (Le., the pollutants orrelated compounds) give out. In essence, then, intrinsic bioremediation is just an alternativephrase describing the process of natural biodegradation.A straightforward way to accelerate biodegradation is to foster the growth of specific

    indigenous organisms: namely, those which are the most effective in metabolizing the pollutantsof concern. This can be done, in principle, by supplying certain limiting nutrients (or gases, suchas oxygen, or surfactants) that encourage the growth of the desired organisms, either directly orindirectly. Conversely, it is also possible to eliminate certain nutrients essential to anycompeting organisms that limit the growth of the desired species. In either case, altering the

    8

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    11/59

    environmental abundance of chemicals and nutrients for the purpose ofpromoting specific typesofgrowth and degradation is known as 'enhanced bioremediation.'

    Finally, it is also possible to introduce non-native microbial species into the environmentto carry out the desired degradation process more efficiently. Ideally, these exogenous specieswould have improved biochemical characteristics relative to those ofnative species (and possiblyother attributes as well). This approach is referred to as 'bioaugmentation.' Bioaugmentation isby no means a new or untried concept. For example, augmentation is routinely performed byhomeowners who use products such as Rid-XTM to improve the performance of heir septic tanksand cesspools. Rid-XTM has been a successful commercial product for over 50 years, andadvertises itself as being "100% Natural." Its active ingredients consist largely of dried bacteria,yeasts, plus some nutrients that help to colonize septic tanks and promote fermentation. Productslike Rid-XTM alter the natural balance offlora and fauna in the immediate environment of hetank, but are nevertheless widely considered to be far more 'friendly' to the environment thanharsh chemicals, like acids, bases and detergents, which may unclog pipes or tanks butfrequently cause further damage (and waste treatment problems) downstream.

    In its most advanced form, bioremediation involves the introduction ofmicroorganismsthat have been deliberately altered to have desirable properties, such as the genes formetabolizing certain pollutants, or the genes to tolerate extreme environments. Such strains maybe produced by a traditional process of selection and breeding, or they may be created bygenetically engineering. In the future, it seems likely that genetically-engineered strains ofbacteria will play an increasingly important role in bioremediation.

    r!.[I "J A S O ~ 2C)Q3' ,I i

    The Bioremediation Context I,I-=t?"'-=".'I.-i P . : ~ ; c : -i ..............._ ............... ..r Digestion of ~ t r o c h e m i c a l s , chloro- U(VI)02Z+ (Uranyl ion)-. U(IV)Oz (Uraninite solid)t fluorohydrocarbons other organic toxins mediated by anthroquinone disulfonatef by microbes & microbial communities uo,>' /. 00 , Immobilization of soluble heavy metals o , ~ . ~ OH I \ 0 u,_(e.g., MeHg) and radlonuclides m...-yoool ~ y " ' " Carbon sequestration (C02 fixation) for " ' , ~ ~ ,,,,,,.-lvyv ,,,.climate stabilization The ecological fate of microbes whose growth is accelerated ;n s;tu (enhancedbioremedlation), or are released 'into thewild,' (bioaugmentation) ;s generally unknown The role of lateral transfer in spreadinggenes useful for bioremediation in microbialcommunities sites ;s generally unknown Augmentation of bioremediation capabilitiesthrough genetic modification is a currenttopic of researchDNA BlJICOdes &Walennll/1(s Unclassified

    AQDSH z AQDS+ 2W

    Uraninlte precipitate onSflewonella onefdflnsis 5

    Irrespective of he origin of he microbes used, however, all bioremediation efforts leadto environmental changes - hopefully, beneficial ones. Ideally, microorganisms whose growth isfostered during the remediation process die of fnaturally once their job is complete, as thenutrients become limiting, thereby restoring the environment to a more pristine, original state.

    9

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    12/59

    However both introduced organisms and indigenous species whose growth has been enhancedmay linger in the immediate environment or spread beyond their original site, producingunintended consequences.

    To monitor the process ofbioremediation, it will be vital to establish a socially andlegally acceptable means o ffol lowing the growth and ecology ofall species usedforbioremediation. Towards that end, JASON explored the possibility of developing a robustsystem of genetic tags that would facilitate microorganism identification and tracking. Thissystem would be, in many ways, analogous to the process ofbanding birds (and other fauna) inwildlife studies, except that the tag would be incorporated into the genome itself.

    IIJASOfl 2003" III I I

    Banding the birds... - .......................... .. Spread of ground-borne contamination

    occurs on geological length scales (km),requiring comparably widespread dispersalof bforemediation agents The spread of oil slicks is no different Contaminated groundwater plume in TestArea North, 10 (lNEEL) treated with Na Bforemedfatfon agents that are Initially lactate to stimulate chlororesptratlonconfined to specific regions can, and do, leak Bioagents or blofilms attached tosolid matrices are helpful, but no panacea It is vital to establish a socially and legallyacceptable means of following the microbesused In bioremediation

    DNA BlJICOdes & WID/m!rlcs UnclassifiedRaptor migration studied by banding

    6

    If such a genetic 'barcoding' system could be developed for the microorganisms used inDOE bioremediation efforts and demonstrated to work successfully in the field, it mayeventually have much broader utility. Perhaps the most significant application beyond thebioremediation context might be the use ofan analogous barcode system to track the spread ofgenetically-modified organisms (GMOs) in the environment, a topic of considerable national andinternational interest.

    An astounding a number ofmicrobial species have unusual metabolic capabilities thatenable them to accomplish such feats as degrading petrochemicals, metabolizing halogenatedorganic compounds, reducing heavy metals such as mercury and uranium, and so on. Thefollowing list shows 16 promising bacterial species currently under investigation in projectssponsored by the DOE, alongside another 52 genera ofbacteria and fungi that are capable ofdegrading oil (either individually or in combination):

    10

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    13/59

    Scope of the issueMicrobes currently t I

    Interest for blorernediation:Addothiobacillus ferroxidansAgrobocterlum tumifociensBacillus lkheniformisBacillus megateriumBacillus subtiUsBacillus thuringif/Rs;Sfl8urkholderia fungorum UJ400Deinococcus radioduransMethylosinus trkhosporiumNitrososomans europaeaPhanaerochaete chrysosporiumPseudomonas fluorescensPseudomonas putidoRhodobacter sphaeroidesRhodopseudomonos palustrisShewanella oneidensis

    Additional aenera t I oll-dearadingbacteria and fungi:Achromobl>

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    14/59

    What a DNA Barcode is NOTThe phrase "DNA barcode" has already been used by some researchers to refer to someother things. To alleviate any possible confusion, we therefore begin by drawing a cleardistinction between DNA barcodes, in the context of this study, and these alternative uses of thephrase. First, a barcode is not a DNA tag. DNA tags involve the use ofprepared DNAmolecules containing known, amplifiable sequences as a means of tagging and identification.DNA tags are in widespread use today, and are commercially used to mark objects of value foranti-counterfeiting purposes. For example, DNA tags were used on tickets and merchandiseassociated with the 2000 Sydney Olympics, and also to mark footballs used in the NFL SuperBowl and hockey pucks used in the NHL3. Such tags consist of exogenous DNA sequencesmixed with chemical stabilizers. They are not heritable and they can become lost, degraded, ordiluted over time - although they are surprisingly robust. DNA tags are not suitable formarking self-reproducing organisms in a bioremediation context.'DNA barcode' has also been used, unfortunately, to refer any convenient, variable DNA

    sequence pattern already found in Nature as a consequence of evolution. Thanks to rapid andpowerful DNA hybridization techniques, such patterns can now be exploited, for example, todistinguish among otherwise similar organisms, to identifY the genus and species of DNA froman unknown source, or to catalog many diverse species. Since higher organisms all possessmitochondria, the variable portions ofthe mitochondrial genome sequence have provedespecially useful for categorizing eukaryotes4.

    I I, I ,JAsoN 2003', ;I ,First, what's NOT a DNA barcode?

    i' HOT a DNA Tag. - These use exogenous DNA sequences

    - These are not heritable> Can become lost, degraded or diluted

    NOT a DNA Signature or Fingerprint- These use endogenous DNAsequence information- These can be extremely useful,but they lack uniformity

    phylogenetictr.;,e forB. anthrads

    > Work much better in some species than others> Extensive typing of many isolates required

    DNA Barcodes &WatennlJlks UnclaSSified

    3 http://www.dnatechnologies.com

    9

    4 Hebert, P.D.N., Ratsingham, S. & Dewaard, J.R. Barcoding animal life: Cytochrome c oxidase subunit 1divergences among closely related species. Froc. Roy. Soc. Lond. B. 270;1524: Suppl. 1: S96-99 (2003).

    12

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    15/59

    In a similar fashion, most prokaryotes can be categorized by means of he sequences oftheir 23S ribosomal RNAs. However, such natural genomic sequences are not analogous to"barcodes" in any strict sense, because they do not represent manmade labels that have beenmanufactured and introduced into the organism for the express purpose of identification, as DNAbarcodes are. Such natural sequence variations are therefore best described as "DNA signatures"or "DNAfingerprints."

    More importantly, DNA fingerprints (when used as barcodes) do not represent auniversal form oflabel: they lack uniformity because they rely on variable, endogenousinformation. In order to use DNA fingerprinting, unique sequences identifYing each variant ofthe organism must first be identified and characterized, and this requires extensive collection andtyping of specimens. Furthermore, some species mutate more slowly than others. As a result, onoccasion it can be quite difficult to identifY sequences that reliably encode variation. A notoriousexample here is the anthrax bacterium: most strains ofB. anthracis differ remarkably little, withas few as 8 point differences among all the recent "Ames" strain variants.What a DNA Barcode ISA DNA barcode is an artificial, silent genetic marker consisting ofa relatively short stretch ofDNA carrying a designed sequence. In practice, a DNA barcode would be introduced benignlyinto the genome of the host organism as a sequence tag, in such a fashion that it conferred nophenotype. It would be heritable, so that all progeny carry the identical barcode. The sequenceofthe barcode would carry encoded information in a robust manner. For example, the barcodecould encode a serial number corresponding to an entry in a public database carrying furtherinformation about the strain. The barcode would be constructed in such a way that it would bestraightforward to locate and identifY in the genome, using established methods for DNAamplification and hybridization. A DNA barcode is overt, in the sense that its presence is not

    , I J A S O ~ 2003: J r

    iii I-

    What is a DNA barcode?..................... ...... An Artificial, Silent, C)VSltGenetic Marker

    -Inserted into endogenous DNAsequences in benign locations-Heritable- Coded to carry informationin a robust manner- Straightforward to find a identify- Readily read-Public use

    DNA BIl/COdes &Watennarks Unclassified

    13

    ISKN: 9039341

    5 00009 03834 4UPC barcode

    blirctode rllader 10

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    16/59

    deliberately hidden or encrypted: it should be readily detected using standard methods, and it isintended for general use in the public domain.What is a DNA Watermark?

    In conjunction with DNA barcodes, it is useful to develop a closely-related concept: aDNA watermark. Unlike the barcode sequence, which is a continuous length ofDNA designedfor easy detection, a DNA watermark carries information in a more distributed (holistic) manner.As a direct consequence, it is difficult-to-impossible to locate and read a watermark withoutspecial knowledge. Like a barcode, however, a DNA watermark should be heritable and conveyno measurable phenotype.

    ,JASON 2003' I

    , I

    What is :a DJ t1-\ Ida termad\ ?.................. _ ................. An Arttflctal, Stlent, CovertGenetic Marker-Inserted into endogenous DNAsequences in benign locations-Heritable- Coded to carry informationin a robust manner- Very hard to find or identify

    Gennan 10 Mali< banknote withportrait of K.F. Gauss

    - Read only with special knowledge early watermali

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    17/59

    The BarcodelWaterrnark Tagging SystemA correctly implemented system combining both barcodes and watermarks would confermultiple advantages for tagging and tracking microorganisms in the environment. Since allbarcode tags will carry common sequences alongside additional sequences unique to the tagged

    organism, rapid field testing for the presence or absence oflabeled organisms is greatlyfacilitated. The uniformity ofthe barcode system across phylogeny finesses the problem oflimited natural sequence differences associated with traditional identification systems based onnatural variation (fingerprinting). Other advantages of a barcode/watermark system includerobustness and homogeneity in the readout, immunity from several kinds of ambiguity arisingfrom sequence homoplasy (that is, convergent evolution towards the same sequence) andreversion (accidental return to the same sequence), as well as protection against tampering andforgery. Finally, it may even be possible to enhance the basic barcode/watermark system byintroducing special barcode sequences deliberately designed to mutate over time, but at ratesmuch faster than normal evolution. Such hyper-mutating sequences could provide a kind of'ticking clock' that changed some region ofthe barcode on a regular basis. In principle,hypervariable sequences might be used to determine the time lapse, measured in generations,from the release of he original tagged strain to the capture of the target strain.

    I ,JAsoN 2003: 1 I"I I

    WHY put in barcodes ~ ~ / r ~ ! Cs(fJEl(i\:::; ?............ ............... Provides a system for tracking & taggingmicroorganisms in the environment Field testing is greatly facilitated Uniformity across all bacterial phylogeny Homogeneity in responses of the readout Lends itself to multiplexing Uniqueness; protection against homoplasy & reversion Discourages tampering and forgery Offers additional possibilities- Ability to introduce faster evolutionary clock-More (e.g., self-terminating or self-limiting strains)?

    DNA 8a/COdes &Wa/ermaI1(s UnclassifiedIntroduction to the Bacterial Chromosome

    12

    To set the stage for a practical plan for the implementation ofbarcodes and watermarks,we begin by describing salient aspects of the bacterial chromosome. Most bacteria have one, orat most a few, circular chromosomes made ofDNA, each containing a unique origin ofreplication. In addition to the chromosome, a bacterium may have one or more much smaller,circular DNA elements called plasmids, each with its own origin of replication. (Indeed, the

    15

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    18/59

    distinction between a plasmid and a chromosome is mainly one of size, not of function. Someplasmids carry supplementary genes and are dispensable; some are not.)

    I I

    JAsoN 2003 ,j fIj I

    Bacterialgenomes:A parts primer....................................

    Size: -1-10MbpCircular map for chromosomes &plasm idsProphage

    bacteriophate (virus)D ~ t n t . . atedtntothe chromosome

    Insertionsequences_SNPs ---JSinai. Nucleottde Polymorphlsms

    DNA 88/Codes& w ~ s Unclassified

    Genes-(in operons)

    Regulatoryelements ----e.

    The bacterial chromosome consists mainly ofgenes (i.e., sequences that code forproteins) organized loosely into co-expressed groups, called operons, each with its own genecontrol region. But there are other notable features as well: see the figure above. These includeregulatory elements, such as the aforementioned control sequences, as well as a replicationorigin, short intergenic regions, and insertion sequences such as transposable elements. Alsonotable are DNA sequences where a bacteriophage (virus) has inserted itself into thechromosome and generally lies dormant. Such a continuous segment of intact viral DNA istermed a "prophage" sequence. Imbedded prophages can, with appropriate stimulation, excise .from the chromosome, then replicate and proceed to form live virus particles, eventually lysingthe host cell. Hence, dormant phage sequences are said to be 'lysogenic.' DNA sequences thathave lost by mutation their ability to excise from the genome are called 'cryptic prophages.'Also illustrated in the figure are examples oflocations where the genomic sequence may havemutated away from the wild type by point changes to single bases in the DNA. These are calledSNPs (Single Nucleotide Polymorphisms).Where to Place Barcodes & Watermarks

    To discover places where barcodes might be placed in a candidate organism, it isextremely helpful to know its complete DNA sequence. Fortunately, the full genomic sequencesfor many species of important bacteria are now known, and plans are already in place tosequence all major human pathogens, as well as most ofthe bacteria vital to agriculture andindustry. The DOE is also sequencing several of the major organisms of interest for

    16

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    19/59

    bioremediation purposes. The sequencing for one ofthese, D. radiourans, is now complete, andseveral more are in the works. In the near future, whole-genome shotgun sequencing of anyorganism with a genome as small as a bacterium should not present much of an obstacle, andcould be accomplished in a matter ofdays. We will therefore assume that the genomic sequenceof any microorganism selected for DNA barcoding is known in advance.

    WHERE do we put in barcodes ; ~ ~ ' f / ~ l CS'fJfElfkj ?..................... ............ .It is assumed that we know the genomicsequence of any organism to be barcoded.

    Intergenic regions orbarcodes Phage attachment loci Pseudogenes Cryptic prophage sequences 3rd b a ~ e d e g ~ n e r a t ~ positions i n } rw ~ r ! ~ J protem-codlng reglons . Tandem repeats) etc. } Insertion Elements, Transposol'lS good for certainother trfcks

    DNABBlCodes &Walemll.rl(S Unclassified

    Barcodes - Where in the genome?. . . . . . . . . . . . ........... .......... . .Bacterial genornes are a 'target-rich environment' for barcode insertion

    Intel'1entc sequencesNon-coding regions between operonsCryptic prophage sequencesProphages that have lost (bymutation) their ability to excisefrom the chromosome andpropagate as normal bacteriophage

    Barcodes inserted into these sequences can, subject toconstraints, alter the genotype without altering the phenotype.DNA Barcodes &Watermarl

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    20/59

    Clearly, it is not desirable to insert a DNA sequence into the genome in any place whereit will disrupt either genes or genes expression, and thereby confer a phenotype. There areseveral specialized types of chromosomal region, however, that may accommodate theintroduction of exogenous DNA sequences without producing a measurable phenotype. Someamount of experience and experimentation may be required to identifY the most suitable regionsfor the placement of barcodes and watermarks, in practice. Generally speaking, barcodes arebest placed in some of the following locations: intergenic regions (regions between genes,particularly those situated between operons ofopposite polarity), phage attachment loci(positions where lysogenic phage insert), pseudogenes (genes that have lost their function andare no longer expressed as full-length, functional proteins), or inside cryptic prophage sequences.For watermarks, a scattered set of silent, single-base changes (SNPs) represent attractivecandidates, particularly when such changes occur in the degenerate, 3 rd-base positions of codonsfor amino acids. This approach is described in greater detail later in this study report. Forpurposes beyond barcodes or watermarks, tandem repeats and insertion elements offer otheropportunities: these, too, are described in more detail in a later section of this report.Despite the comparatively high density ofprotein-coding regions found in prokaryotes as

    compared with eukaryotes, the bacterial chromosome is still a fairly "target-rich environment"for the insertion ofDNA barcodes.Anatomy of a BarcodeWe now consider the essential elements of a prototypical DNA barcode. As a practical matter,the full barcode sequence should be short enough that it can be produced with few errors duringa single round ofDNA synthesis at a reasonable cost. As of today, DNA oligos can be orderedfrom commercial suppliers in lengths of 80-100 basepairs, so this represents the current practicalupper size limit. However, one may anticipate that somewhat longer barcodes may becomefeasible in the future, should these be required.

    !JASON 2003 t II jl l

    Constructing a DNA barcode................ Use to code for a unique serial number in a database Appropriate length: use n = (20-35) bp 'word' (N - 10'2-102')

    - flanked by 2 x (-20-25) bp primer regions for PCR 60-85 bp total, suited for single-round DNA synthesis- Generic primers form a universal signal for tagged organisms

    No significant secondary structure should be predicted Should contain -50% G:C content (natural) Robust against PCR errors, mutation and false hybridization

    - No two barcodes related by fewer thanm = 3-5 base changes; use Reed-Solomon or similar ECC- Can use frameshifted primers (20/25 bp) for redundancy

    JASON estimates 10,000-100,000 unique codes are usable for n = 20DNABarcodes &W&t9nnam: Unclassified 16

    18

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    21/59

    The overall barcode sequence is conceived to consist of hree parts: a single, centralregion containing a unique DNA code sequence, flanked on either side by two shorter regions,designed to serve as primer sites for the PCR amplification of the central region. The centralportion would code for a unique number corresponding to a serial number entry in the barcodedatabase. A sequence of20-35 bp could, in principle, generate from 10 12_1021 unique serialnumbers, but there are practical considerations that severely restrict this limit (see below). Theflanking regions would be complementary to a single common set of 'universal barcodeprimers' that would be used to initiate a PCR reaction to amplify (and thereby to pull out ofacomplex reaction mixture) any DNA fragments carrying these flanking regions. The existence ofan amplicon derived from the common primer set would, therefore, constitute a generic signalfor the presence ofa barcoded organism, irrespective of its serial number. This could serve asthe basis for a rapid field test for barcoded organisms. The sequences ofthe central regions ofany amplicons would indicate the organisms themselves. Common flanking primers would befrom 20 to 25 bp long apiece (see the illustration below), so the overall length ofthe barcodeinsertion might range from 60-85 bp, which falls readily within current manufacturing limits.There are further criteria that are important to meet in the design of a practical barcode.To prevent self-priming and false amplification during the PCR reaction, a barcode should nothave any propensity to form a DNA hairpin structure in single-stranded form (i.e., it cannot beself-complementary). Furthermore, to remain biocompatible, the G:C content of the barcodeshould be roughly similar to that found in living organisms, i.e., ~ 5 0 % . Finally, to make thesystem robust, it is advantageous to encode the central region in such a way that no two serialnumbers are related by fewer than 3 (or more) base changes. In that way, barcode serial numbersmay develop up to 3 random mutations but still be unambiguously identified. Finally, by usingcommon primer regions containing 25 bases, in which just 20 contiguous bases are actually usedfor amplification in the PCR reaction, it is possible to use multiple sets of redundant, 'frameshifted' primers. Frame-shifted primers can be used to overcome potential problems offalsepriming at alternative sites that might occur by chance in some species

    I I ,IJ A S O ~ 2063" I

    1 I' ! II ..

    Anatomy of a Barcode......_.......................... ..20/25 bp primer blndilllsite 20 bp barcode 20/25 bp primer binding site___ .....__ .....

    - - - ---"V " ,- --common primer 1 C T T C ~ G C A C G G ! A C A G ~ G common primer 2

    Amplify by peR with primers directed against commonsequencesReadout by hybridization microarray ('DNA chips')

    DNA 88/COdes &We/ermarIcs U n c ! a ~ i f i e d 19

    17

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    22/59

    A Practical Barcode is Shown (figure above)In this implementation, the serial number region is encoded as a 20 bp region, while thetwo flanking primer regions are each 25 bp long, of which 20 bp are used to prime the PCRreaction, as indicated by the arrows. The overall barcode length is 70 bp. Once amplified byPCR, DNA barcodes could be ' read' without a need for direct sequencing by using a suitablehybridization arrays carrying complementary sequences for all barcodes found in the database.This process is described in more detail later in this report.

    How Many Practical Barcodes Exist?We now turn to the question of how many unique serial numbers exist, from among the1012 possible choices offered by N = 20 bases, once all the design criteria are met for G:Ccontent, lack of self-priming DNA secondary structure, and the requirement that all codes beseparated by several mutations. Are there enough codes to go around? JASON looked into thisquestion in detail. The bottom line is that there are sufficient codes to tag all the organisms of

    interestfor the foreseeable future, provided that a central coding region of20-24 bp (or longer)is used

    tiI

    , , JJ A S O ~ 2003 'I I

    If I I Lots of sequences are availableBarcode "Drake equation" (I)

    i.. a . . . . ._a . . . . . . . . . .__ . . . . . . . . .__ ....o . .I' Sequence length = . N= 4" sequences are possible. 20 bp -1 012 sequencest. For ECC, no two sequences may be related by fewer than m mutations (base changes).

    Nusab/e = (N F)= 4n (FECC FG:c FN020 5 )= 1+3n+(;)3' +6 +(;)3m ( 3 ~ m , fornm

    FCC ==l/N; (3:)m -2.4xlo-7form=5GFa:c(tl) == erf(tlJ2n) ~ 0 . 5 0 f o r 5 %

    DNA 8111COdes&Walemlm Unclassified G:C 18Sequence availability is examined in detail in Appendix A. However, a simple way tothink about the numbers is to construct an analog ofthe "Drake Equation", which was used toguesstimate the chance of finding intelligent life elsewhere in the universe by multiplying out allthe independent probabilities that underlay its likelihood. To form an estimate ofthe numbers ofuseful barcodes, we assume that the design criteria are statistically independent, and thereforemultiply the fraction of codes with normal G:C content (FG:d by the fraction that don't formhairpins (FNo2) by the fraction of codes with sequences separated by m or more point changes

    20

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    23/59

    (FECC). This is then multiplied by the number ofpossible codes, N =420, to estimate the usablenumber, Nusable. For random sequences, it turns out that Fa:c is close to 50%, assuming that oneaccepts sequences in the range of45% to 55% G:C content. Furthermore, FNo2 is found to besomewhere around 30%, assuming that a hairpin will form from three or more complementarybases whenever these are separated by a loop of three or more bases. (A more accurate estimateof this fraction can be obtained by running the M-FOLD program against all sequence candidatesto calculate the free energies ofpossible secondary structures and rejecting all those sequencesthat form stable hairpins). By far the most stringent criterion for rejection, as one might easilyanticipate, comes from FECC, which is estimated at 2 x 10-7 or less. Accepting these values, onewinds up with the rough estimate N(20)usable:::; 40,000.

    Barcode "Drake equation" (II)Effect of secondary structure: No hairpins or -30% usable," = 20'primer dimers' desired. Min. loop size = 3. ~ ~ : ---q- random sampttng (of 10,.__ 2 25% Zuk"r's MFOLD program ,

    hairPin:: l i ~ ! ~ : ~ O loop B ! ! ~ : + - ' = S ~ = : 3 ~ ~ - - : : - - ~ = = _ - = = _ - - = - ~ = - I r - - - - - - - . - . - . - - . - . ~ r _ . _ _ . ~ ~ 5 % t ~ ~ ~ ~ ~ ~ = = ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 0%20 25 30 35 40tag length n

    n-2JM =Ln-2s-B+ 1=+(n-2s -2Xn-2s -1)

    B=3 M(s = 3)=+(n-7Xn-8)= 78

    N(20)usable 40,000 = more than we'll need. Or, choose n>20.

    Barcode Drake Equation (III)................_............... .

    III 1.0E+10c 1.0E+09l::J!l 1.0E+08IIIQl 1.0E+071.0E+06

    (U 1.0E+050Ql:c 1.0E+04(UQl 1.0E+03II::J'0 1.0E+02.... 1.0E+01l.0E 1.0E+00::Jz 20 21 22

    DNA Bart:od&s &Watermarlrs

    23 24 25 26 27Length of barcode, nUnclassified

    21

    28 29 3020

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    24/59

    A more careful estimate can be prepared by computer simulation, and simplyenumerating (either exhaustively or by statistical sampling) the fraction of randomly-selectedsequences matching all the design criteria, leading to the graph shown above. If it is arequirement to correct 4 or fewer base errors, then there are of order 10,000 barcode sequencesavailable for a 20-mer barcode, ranging up to 100,000 codes for a 24-mer. To correct 5 or fewererrors, one typically finds 10-fold fewer sequences than for the case of4 or fewer errors.Constructing a DNA Watermark

    J A S O ~ 2063' III "!;II 410 artificial single nucleotide polymorphisms (SNPs)I 'salted' throughout the genome 004. . . . ... . . . . . . . . . . . . . . . . . 04 ,. ..... . . . . . . . .04. . . . . . . ...........04 . . . ... . . . . . . . . . . . . . . . . . 04. . . . . ...Constructing a [)J 1/\ ";)cl t 2 r r J E l r J ~

    Used to protect the barcode- Against intentional tampering (Bo/W1)- Against lateral gene transfer (B1/Wo) Robust against detection- No phenotype- Error rate in genome sequencing 24 MB SNP number- Variation among natural isolates> SNP number- Need to know the flanking regions to read- May use natural SNP variations to guide watermark choices

    > i.e., an unnatural combination of natural SNPsDNA 88fCOdes&Walenn8l/(S Unclassified 21

    As discussed, a DNA watermark is a cryptic series of changes to the genomic sequencethat produce no discernible phenotype. The watermark is designed to be difficult-to-impossibleto detect and read without special (proprietary) information. We envisage the use ofDNAwatermarks directly in conjunction with barcodes, as a means to protect the integrity of thebarcode system for marking organisms slated for bioremediation purposes. [Clearly, however,watermarks could equally well be developed for applications independent ofa barcode system,whenever clandestine genetic marking alone is desired. We will not pursue this possibility here.]There are many ways, in principle, to generate and encode DNA watermarks: some ofthese schemes, by analogy to digital watermarking solutions in the computer world, can be quitesophisticated. However, one particularly straightforward system would be to introduce several

    single-nucleotide polymorphisms (SNPs) at scattered locations throughout the genome. Asimple watermark might consist of4-10 such SNPs. All barcoded bacteria would initially bewatermarked (B=1/W=I). Organisms subsequently recovered without their barcode butnevertheless carrying the watermark (B=O/W= 1) would be candidates for strains that may havebeen tampered with. Conversely, organisms carrying a barcode but missing the watermark(B=lIW=O) would be candidates for strains that picked up the barcode by some form ofgene

    22

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    25/59

    transfer, either accidentally (by lateral transfer) or as result ofdeliberate tampering (engineering).A series of ust 4-10 SNPs distributed throughout a typical bacterial chromosome may beexceedingly hard to detect without detailed advance knowledge of their identities andchromosomal locations. The error rate for whole-genome sequencing is such that the number ofmistakes made while sequencing the entire organism would exceed the total SNP number.Furthermore, the number ofDNA changes associated with the watermarking process is likely tobe smaller than the number ofSNPs found among natural sequence variants of any givenbacterial species. Finally, if desired, a very cryptic watermark could be created by producingsome unnatural combination ofnaturally-occurring SNP variants.Introducing Barcodes & Watermarks

    II 1JASON 2003" ,

    I I, .

    HOW do we write barcodes ' r / ~ ! t e r f f l ~ l r ~ / , , ~ ?.at . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Use Site-Specific Genetic Recombination Homologous Recombination

    - loop-in/loop out with selectable markers 'Retrohoming' insertion by group II introns

    - engineer exon binding sites to suit Prophage Insertion

    - with defective excisionDNA 88/Codes &Wa/etmarlls Unclassified 22

    JASON considered a number ofmechanisms for inserting barcodes into a bacterialgenome. Broadly speaking, three such mechanisms seemed particularly promising: (1) usinghomologous genetic recombination in conjunction with selectable markers to target barcodesequences to selected sites, (2) 'retrohoming' by genetically-engineered Group II Intronscarrying barcode information, again targeted to selected sites, and (3) insertion ofmodifiedlysogenic bacteriophage into insertion sites as prophages, carrying barcodes inside phagesdesigned with defective excision.

    The last of these three mechanisms, which takes advantage ofprocess by which lysogenicphage insert at specific attachment sites in the chromosome, is the least general and arguablymakes the greatest demands on a would-be fabricator. For every bacterial host species to bebarcoded, it is first necessary to identifY a suitable lysogenic phage that can infect the host (therewould be no a priori assurance that such a virus could be found for all species of interest). The

    23

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    26/59

    site ofphage insertion into the host chromosome would then have to be mapped, and the phagewould also have to be sequenced, so that the genes required for its chromosomal attachment andexcision could be identified. A suitable mutant would have to be created that was deficient inexcision. Furthermore, a lysogenic prophage carrying a barcode tag would introduce excessDNA into the bacterial chromosome beyond that needed purely for the barcode itself For thesereasons, our study chose to concentrate mainly on the first two alternatives, above. However,there may be instances where prophage-based barcoding is both practical and effective (and weexplore one ofthese later).Placing Barcodes & Watermarks by Recombination

    Provided that the sequence in the immediate target region is known and that a suitablegenetic marker can be identified that can be selected both/or and against, it is possible to insertan arbitrary region ofDNA into the chromosome, by taking advantage ofthe process ofhomologous recombination using a "loop-in, loop-out" procedure. Bacterial species differwidely in the degree with which they support genetic recombination, but many species can beinduced to recombine when the selection pressure is high, as with (for example) antibioticselection.

    The following two graphics illustrate how homologous recombination is used tointroduce a sequence, and Appendix B describes the entire process in greater detail. Becausehomologous recombination can insert a sequence without introducing any flanking DNA (i.e., itgenerates no additional remnants), it is also suitable for generating the SNPs required forwatermarking, as well as for introducing barcodes.

    . ,J A s o f ~ 2003 .

    II I

    Homologous recombination..........................................

    W"Wii.,,::.:.,,*l, ~ = = : ~ ~ = ~ ~ : ~ ~ ~ e -.J plasmid contains a segment of chosenX insertion site with the balfcodeplasmid DNA (chromosom::a:.1DN=A:- __ IOOp-tn).Select for marker l (2 copies flank the marker)-I . Wi,m3*I

    Select against marker l- - - 1 1_.. . . . - - mutation Is transferred to chromosomal DNA

    DNA Barc:odes &Watennarlcs Unclassified

    24

    Loop-out of the integratedsequences by a reversereaction yields aperfect replacement

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    27/59

    " : l jIJASON 2063 1' , I" I !I l } Leaving (almost) no trace:eliminating a marker by loop-out......._......._.............. .

    Problem: In some other methods, the marker used in construction of thestrain remains and alters the phenotype (usually, drug resfstance)!Solution: Eliminate the marker by recombination using flanking repeats.

    recombination sitesflanking the markeriI' "t _11;;:&3__ _ : < ' ~ q : - , :. : ; ' : ~ " , ' "J

    recombination at. I

    sequence between fs eliminatedDNA 811COdes &Wa/emltv1cs Unclassified

    Placing Barcodes by Group II intron "Retro-homing"24

    A promising method for inserting a barcode at a targeted location is to take advantage ofthe natural ability of certain catalytic RNA sequences (ribozymes) to selt:splice into DNA andRNA. The following three graphics outline the procedure, which is described in greater detail inAppendix B. This approach has the advantage ofbeing 'universal,' in the sense that it is

    R N A ~ exon" "1 "RT + = ~ ~ t ~ t

    5"E 0 ~ ~ ~ n t( ) ~ ~ ~ n 5 ' E ~ " t

    5 ' ~ ~ ~ " " 5'E .F"'- '

    A. Lambowilz &colleagues, U. Texas

    Group II intron-basedbarcode insertionProblem: Homologous recombination doesnot always work efficiently in all strainsSolution: Use engineered group 1\ introns(catalytic RNA/enzyme complexes)Group 1\ introns use "retrohomtnl" to insertinto specific DNA sites. The active elementincludes the intron RNA plus an intronencoded reverse transcriptase (RT).Advantage: UniversalityCan be engineered to target any DNA sequenceMinimal dependence on host factors forintegration

    25

    Fully active in recombination-deficienthosts

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    28/59

    independent of he properties of the host organism, and it bypasses any need for homologousrecombination, which may not work efficiently in certain bacterial strains. Dr. A. Larnbowitz'sresearch group (u . Texas) has engineered Group II Intron ribozyme sequences derived fromLactococcus lactis to serve a vectors for targeted DNA insertions. Such vectors alsoconveniently carry a gene for reverse transcriptase, which is required to synthesize DNAcomplementary to the RNA that's spliced into the chromosome.

    (

    I' ,JASON 2003:' ,I II

    Group II intron engineering (I)......................................

    5'

    DNA Baroodes & Wa/ennarlcsI

    J A S o f ~ 2003 '

    Ll.LtrB from Lactococcus tactis isthe best-characterized group /IORF intran

    Group /I introns have an openreading frame (ORF) that encodesthe reverse transcriptase whichcatalyzes integration.This ORF can be deleted andreplaced with the desiredDElfCodc- .sequence. The ORFreverse transcriptase can besupplied by another plasmidduring integration, or placed onthe same cassette as the intron.

    Unclassified 26

    Group II intron engineering (II)I ..............................._ ... ..I ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ................... . .i

    Target DNA5' ACCCACO';CGA T C G T ~ ~ ; ~ ; ; , ~ ; ; : .d t ~ ~ ~ i r c A T T T T T AAT 3'&."., '"".""-, . - ~ " . " " . ,"".,,,.'''.,,,, . ..TGGGTGCAGCTAGCACTTGTGTAGGTATTGGTATAGTAAAAATTA!\l 1ll I IB52 1 B51 18' !

    II 5'exon 3' exon -L-______________________________ -J

    The EBS1, EBS2 and 8sequences of the Lt.LtrBintron RNA pair with thetarget DNA duringintegration.Changing these sequenceschanges the sequencespecificity of integration.

    The E8S1, E8S2 and 8 sequences can be directly engineered to match aprecise target site. But this sometimes results in inefficient integration,so .. Can use an in vivo selection to identify randomly mutated Intron RNAsthat wilt integrate into a target sequence ( K a ~ et aI., NaWf/! Biotech. '01)

    DNABarcodes &Watent'l8lks Unclassified 28

    5 M. Karberg et al. Group II introns as controllable gene targeting vectors for genetic manipulation of bacteria,Nature Biotech. 19:1162-7 (2001).

    26

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    29/59

    Reading Barcodes & WatermarksUsing appropriate technology, it should be possible to read DNA barcodes and

    watennarks by PCR amplification followed by DNA hybridization, without any need tosequence the host genome. This should make rapid field testing possible, for example usinghand-held devices. Moreover, the entire dataset ofbarcoded organisms could be scored at thesame time with a single round ofhybridization, using DNA extracted from environmentalsamples containing potentially complex mixtures of organisms, both tagged and untagged.

    ,JASON 2003" ,

    1 I I

    HOW do we read barcodes 2 : i / c J t : : : ( f f J 2 . l r J ~ : : ; ?........................................-Barcodes- Need to read region flanked bystandard primers-Solution: peR and hybridization arrays

    - Watermarks-Need to read selected SNPs in context-Solution: Molecular Inversion Probes (MIPs)

    > CCread SNPs with MIPs"-Assay with same hybridization arrays

    DNA 881COdes &Walelmarlcs Unclassified 28

    For barcodes, the readout process would first consist of extracting and partially purifyingDNA within an environmental sample, using protocols and sample-handling procedures similarto those currently employed for DNA field testing. The DNA would then be mixed with a set ofstandard primers (20-mers) complementary to the common flanking regions ofthe barcodes,6followed by PCR amplification. The presence ofamplicons in the reaction mixture wouldindicate the presence of a barcoded organisms. Specific barcodes would be read out using ahybridization microarray that carried sequences complementary to each barcode.

    In most instances, DNA microarrays would be designed to check for barcodes but notwatennarks. Such readout devices would remain in the public domain. Under certain circumstances, however, it is desirable to check both the barcode and its associated watermark (e.g., tocheck for counterfeit organisms). It turns out that it is equally feasible to use hybridizationtechnology to read the SNPs used to watennark bacterial genomes, once again bypassing anyneed for direct sequencing. One practical approach to amplify and identify targeted SNPs by6 An overlapping set of frame-shifted primers could be used to improve the signal to noise of this step, if necessary,as discussed earlier. Provision for frame-shifted primers was made by increasing the common flanking regions to 25basepairs, of which any 20 contiguous bases may be used for priming the peR reaction.

    27

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    30/59

    hybridization was developed by Dr. R. Davis (Stanford Genome Technology Center), as is called'MIPs. ' (Molecular Inversion Probes)7.MIPs work as illustrated in the following graphic. In brief, a DNA oligo is designed that

    forms a nearly complete circle, hybridizing directly with genomic DNA on either side oftheSNP to be tested, but leaving the base opposite the SNP unpaired. The oligo is composed ofthree different types of engineered sequence: (1) regions designed to pair with the genomicDNA flanking the SNP to be tested (shown in black), (2) generic sequences that arecomplementary to each of two common primers used for later PCR amplification (red and blue),and (3) a SNP marker tag that uniquely marks the particular oligo (purple). The oligo is thenallowed to pair with the genome, and the reaction mixture is divided into 4 aliquots. Eachaliquot is then incubated separately with a single nucleotide (A,T,C or G) plus ligase. Closedcircles are formed only in the reaction mixture that's been incubated with the complementarynucleotide. An exonuclease digestion follows to degrade all DNA not formed into circles. The 4reaction mixtures are then incubated with common primers, which are arranged back-to-back asshown, and amplified by PCR. Finally, the identities ofthe amplified circles are scored byhybridization against an array that detects the unique marker probe. Knowledge of both thehybridizing marker probe and the single nucleotide leading to successful amplification(circularization) uniquely maps the identity ofthe SNP.

    Molecular Inversion Probes "MIPs"Common

    commonQ T TPrimer 1 TT T3

    tAssay: s ~ o

    From: Ron DavisStanford GenomeTechnology Center

    Extend 3' end of 'padlock' in the presence of a single nucleotidespecies (here, T).Ligate extended probes to form circles. (Exo digestion to clean up.)Select and amplify the circularized probes using common primers.Perform this parallel assay for each of 4 dNTP species in turnto reveal SNP genotype (here, A) on hybridization array againstthe unique SNP marker tag associated with the MIP.

    The technology for MIPs-based detection ofSNPs used for watermarking and thoseneeded for barcode detection can be combined into a single DNA hybridization array, whichwould have proprietary uses.

    7 P. Hardenbol et aI. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nature Biotech.21 :673-678 (2003).

    28

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    31/59

    A Worked Example: Barcoding & Watermarking the Model Organism, D. radioduransTo frame the discussion somewhat more concretely, JASON explored a specific example

    ofbarco ding and watermarking. The strain we selected for further study was Deinococcusradiodurans, sometimes jokingly referred to as "Conan the Bacterium" for its astounding abilityto tolerate stress and punishment. It was first discovered in 1956 in Oregon in canned meat thathad spoiled despite exposure to X-rays. D. radiodurans is among the most radiation-resistant ofall known organisms: it can withstand up to 1.5 Megarad acute dose and can grow continuouslyin the presence of radiation levels of6 kiloradlhour. Moreover, it can cope with extendedperiods ofdesiccation, starvation, UV light, and high levels ofperoxides. It is nonpathogenic, itis transformable, and it is quite commonly found throughout the biome in the soil. Muchattention has recently focused on D. radiodurans as a candidate organism for use inbioremediation, particularly at DOE wastewater sites where radiation levels can reach10 mCilliter or more. The genome ofD. radiodurans has been fully sequenced8 and analyzed9,and the organism has recently been genetically engineered to carry genes from E. coli that allowit to reduce Hg(II) and certain other heavy metals lO

    l . ,J A S O ~ 2003' ,I II I

    Amodel organism barcode_ .m . . . . . . . . . . . . . . . . . . . . . . . . e l l "

    [ {)einococcus rac/iodurans "Conan the Bacterium"i -Withstands 1.5Mrads, desiccation, starvation,UV light, hydrogen peroxide; grows well at 6 kradfh

    - Fully sequenced, 3.2 Mbp (White et at. Science '99)> 2 chromosomes (2.6,0.4 Mbp), 1 megaplasmid (177 kbp)

    - Nonpathogenic; found throughout biome in the soil Transformable; has been engineered for bioremediatfon- MeHg reduction (mer operon) (Brim et al. Nature Biotech.'OO)

    - growth on toluene, chlorobenzene. Lysogenic phage identified (Hatfull It Sarkis, Mol. Micro. '93)

    - L5 mycobacteriophage family

    D. radiodurans,dividing

    - Excision gene known (DR145S) Has mu prophage in genome (Morgan et al. JMB '02) L5 mycobacteriophageDNA B8JCOdes &Walennm Unclassified 30

    8 O. White et al. Genome sequence of the mdioresistant bacterium Deinococcus radiodurans Rl. Science 286:1571-1577 (1999).9 K.S. Makarova et al. Genome of he extremely mdiation-resistant bacterium Deinococcus radiodurans viewedfrom the perspective of compamtive genomics. Microbiol. & Mol. BioI. Rev. 65:44-79 (2001).10 H. Brim et aI., Engineering Deinococcus radiodurans for metal remediation in mdioactive mixed wasteenvironments. Nature Biotechnology 18:85-90 (2000).

    29

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    32/59

    The D. radiodurans Rl genome (3.2 Mbp) consists of two chromosomes and twoepisomes, a normal-sized plasmid and much larger 'megaplasmid'. Depending on thechromosome or episome, between 81 % and 91 % of the DNA codes directly for protein; the restconsists of either control sequences or regions containing repeat content, ranging from 1.4% to13%:,I J

    JAsoN 2003" 1I I ,

    The Deinococcus radiodurans genome 1 ............................. .

    Selecting the Barcode Insertion Site

    Choosing theDeinococcusbarcode site

    Site must tolerate insertionwithout altering phenotype,and be genetically stable.Top choice: Interaentc reaionsWe select candidate sites by:o Identification of phageattachment siteso Bioinformatic analysis ofthe sequence

    The bioinfonnatic content ofD. radiodurans genome offers ample opportunity forbarcode insertion. Candidate locations include both phage attachments sites and intergenicregions, as discussed. A lysogenic prophage has been identified in the D. radiodurans sequence

    30

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    33/59

    from the L5 myocobacteriophage family whose excision genes are already known I , as well as acomplete mu prophagel2

    A promising site for barcode insertion would be in the region immediately adjacent to theinsertion site ofthe resident mu prophage (illustrated in the graphic below). Between geneDRA0074, which codes for the mu repressor, and gene DRA0073, which codes for a transporterATPase used by R. radiodurans, lies a short, 493-bp region that is almost certainly noncoding.This supposition is reinforced by the fact that the ATPase and repressor genes are transcribedwith opposite polarities, as shown. Selecting a specific location comfortably within in thisregion, we pick a site between the C and G bases in the intergenic sequence

    .. . AACCCTCAGCJ-GAACGAATGC. .located 200 bp upstream ofthe start codon ofDRA0073.

    jJASON 2003" ;

    I I ~ ~

    , Barcoding Deinococcus radiodurans! I .a be. au .a ........ a .a . . . . . . . I I)IH1M75 DRA0016

    DRA0074Mu prophagerepressorAnnotated genes DRAO074 - DRA0119 are a Mu prophage

    Insert barcode herelTarget barcode to the 493-bp region between genes

    DRA0073 and DRA0074f TjICAGTGGAACCCTCAGCGM.CGAATGCGTCT' 200 bp upstream of DRAOO73 ATG startf

    DNA Bercodes &Walennarks Unr.:1.ass1fied

    Choosing the Watermark Sites36

    To watermark D. radiodurans, one takes advantage of he intrinsic degeneracy of the 3-letter amino acid code. There are 43 = 64 possible 3-letter codons, but only 20 amino acids.Three codons code for stop sequences (T AA, TAG, and TGA), leaving 61 codes for amino acids(i.e., 61-20 = 41 codes are 'extra '). As a result, all but two amino acids (methionine andtryptophan) have more than one corresponding codon, and some have as many as 6 (serine). WeI I G.F. Hatfull and GJ. Sarkis. DNA sequence, structure and gene expression ofmycobacteriophage L5: a phagesystem for mycobacterial genetics. Mol. Microbiol. 7:395-405 (1993)12 GJ. Morgan et al. Bacteriophage mu genome sequence: analysis and comparison with mu-Iike prophages inHaemophilus, Neisseria and Deinococcus. J Mol. Bioi. 317:337-359 (2002).

    31

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    34/59

    begin by consulting the codon usage table for this organism: the chart below shows both theproportion and number of each of the D. radiodurans codons. There are 973,776 codons in all,and 105,000 11 %) of these happen to code for the amino acid leucine. There are four possiblecodons for leucine (CTT, CTC, CTA, CTG). Ofthese, two are used very infrequently (CTT,CTA), and the remaining two account for 94% of all the leucine codons (CTC, CTG).I I

    J A S O ~ 2003" ,I I ' Watermarking Deinococcusat selected 3rd base positions

    , T , C I A I G! rr p t , ~ IF) '"TCT'StIIS\ ; TAT T)I ',YI 'Tor lei -fTl. . . . . .__ . . . . . . . . . . . ,011__ ..... I ~ i 1'! 1TC"I'hclFl 'T( 'e SerIS] :TA(,T)7 1'1'1 ! ' I , ( ; C ' ( ' ) ' ~ (t'! Ie: : T r A l ~ u l l J . TCAS(rIS] 'TAATi',[

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    35/59

    way' trip into the host chromosome. Also, the barcode sequence would need to be insertedsomewhere into the phage chromosome, presumably at a location that is non-essential (possiblywithin the disrupted xis gene, for example). This overall process would be quite analogous tothat involved in making a generalized transducing phage, such as ",-gtll, and inserting thebarcode sequence as the payload.

    , ,JASON 2do3'

    III 1 r

    Alternative D.r. barcode vector schen* I"

    Use modified L5 mycobacteriophageto deliver barcode to phage ott site Step 1: Development of phage system

    - Cure bacterium of lysogen-Identify conditions for lysiS, lysogeny, etc Step 2: Construct tagging vector L5 mycobacteriophage- Barcode inserted in non-essentialregion (a la transducing phage)- Excision gene DR1455 (xis) inChromosome I to be deleted

    > ensures 1-way trip!DNA Barcodes &Watennat/(s Unclassified

    A JASON Idea: Fast, Adjustable Molecular Clocks

    D. radiodurons on petri plate36

    Beyond DNA barcodes and watermarks, which constitute fixed sequence tags fortracking and identifYing organisms, it might be useful to imbed manmade DNA sequences inorganisms for additional purposes. One intriguing possibility entertained by JASON was toinsert a special DNA sequence that (somehow) mutated harmlessly, but at a much faster rate thanthe normal genomic DNA. In principle, if the rate at which such a sequence accumulatedmutations were sufficiently high, it might be possible to score the number ofmutations acquired,and thereby to estimate the number ofgenerations that had passed since the organism wascreated and released. In effect, this hyper-mutating DNA sequence would function as astochastically ticking, molecular timer.

    Slower molecular 'clocks' based on DNA sequence comparisons have long been used toestablish phylogenetic trees relating organisms on evolutionary time scales. Such comparisonsform the experimental basis for modem molecular approaches to evolution, in fact. However,the rate at which natural sequences diverge by a process ofneutral mutation is quite slow, about1% to 5% per million years. Sequences diverge more quickly when subjected to selectionpressures, but they tend to do so in non-random and unpredictable ways. To compute generationnumber on a timescale ofmonths-to-years using information from sequence divergence wouldrequire that the DNA mutate several orders ofmagnitude faster than is normal. Furthermore,

    33

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    36/59

    these high rates ofmutation would need to be confined to the clock sequence region, allowingthe rest ofthe organism to mutate normally.,

    JASON 2063: III I ,

    I!JASON idea: Adjustable molecular clocksf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Rates of change in DNA by neutral mutation are stochastic,[ and typically in the range of 15% perMegayear r - - - " - - - - - - - , - - , c - = : : - - ~ - Rates are fairly stable over geological time, andtherefore constitute good 'molecular docks'for the study of biological evolution- They can be caltbrated- They form the basis of modern phylogenetic trees- But they change much too slowly Genetic mutations that are selected lead to DNAvariations that can be many orders of magnitudefaster or slower, but in unpredictable ways.- These do not serve as very good docks We seek a neutral clock that ticks (mutates) fast- This could be used to count the number ofgenerations since the release of a barcoded organism.- ONLY the clock DNA. should tick fast, not the rest of the genomeDNA 88/COdes &WalfJnn8rlcs Unclassified 37

    Could some kind ofhyper-mutating sequence be developed based on currentbiotechnology? And could the accelerated mutagenesis rates be confined as required within thespecial sequence? Recent experimental findings suggest that such a thing may indeed bepossible, and we present two approaches that offer some degree of promise along these lines.A Clock Based on Tandem Repeats

    First, it may be possible to take advantage of he fact that intermolecular recombinationoccurs much more rapidly between homologous regions ofDNA than between unrelated regions.Specifically, recombination rates have been found to increase from 10-100 fold within directstretches of short sequence repeatsJ3 DNA rearrangements by direct repeats represent a leadingcause ofgenomic instability in E. COli14. Lengthy direct repeats of he trinucleotide sequence(CTG)n, with n = 165, have a recombination rate 60-fold higher than shorter stretches with n=17. [Trinucleotide repeat sequences are ofparticular experimental interest because these alsoappear in certain human genetic diseases, such as myotonic dystrophy.] Furthermore, some shortsequence repeats appear to be more active in promoting intramolecular recombination thanothers, with long stretches of(CTG)n, in particular, presenting hot spots for geneticrecombination. It might be possible to place such hot spots into a bacterial chromosome for thepurpose of introducing a molecular clock. However, it is by no means clear whether such13 M. Napierala et al. Long CTGCAG repeat sequences markedly stimulate intermolecular recombination. J BioI.Chem. 277:34087-34100 (2002).14 X. Bi and L.F. Liu, A replicational model for DNA recombination between direct repeats. J Mol. Bioi. 256:849-858 (1996).

    34

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    37/59

    introduced sequences would have the desired properties. A considerable amount of exploratorywork would be required to determine the actual levels of instability of specific repeat sequences(Do they mutate fast enough? Are they lost as well as gained?) as well as any possible sideeffects (Do they affect DNA repair mechanisms or other functions vital to the cell?) before theycould be considered as candidate for a molecular clock.

    , , ,J A S O ~ 2003" I, I I

    Fast molecular clocks via repeats... ...... . .................... .

    Deletion of tandem repeat byintramolecular recombination orstrand slippage- freq from 1Q-5/day (14 bp)to 103/day (> 100 bp) Recombination reduced 10100 .,. ~ " : " l ~ i ! ~ ~ u ~ ~ c ~ d g ~ n ~ y i ~ ~ ~ ~ i ~ i ~ g j ~ : : : : - - ~ " ~ f : L ~ : 1 . E 0 4 ~ - : - ~ ~ l - ' ; f r j e ~ ( G . A G M c . . Recombination increased 10-100 l i 1 . E 0 5 ~ ~ - ~ : m e r - ( e r - G ) r i __E 1 E . 0 6 ~ : " " ~ , , , I L - ,;,.- eO, h' i .fold with lengthy tn-nucleotide g 1 : E . 0 7 > : ~ j : J \ i ' ; ' d c r i i r [ : i ' l t : : repeats, e.g., tracts of (CAG)n .. 0 200 400 600 800 Frequency of loss is proportional repeat length (bp)to the number of generations (8i a Liu, JMB '96) -----'

    DNA BllI'COdes &W8lennarlcs Unclassified (Hapierala et at., JBC '02) 40

    A Better Clock Based on a Directed, Error-Prone PolymerasePerhaps a better possibility for developing a fast molecular clock was presented by therecent discovery by L. Loeb and coworkers15 (Univ. Washington) that an error-prone mutant ofpolymerase I (Pol I) will introduce mutations at a high rate into a specific, predefined target

    region in E. coli. The measured rate of introduction of errors is particularly well-suited for clockpurposes, because it is x lO4mutationslbp, i.e., every generation will introduce approximatelyone mutation into each kilobase ofthe target DNA region. Moreover, mutations were found tobe quite evenly distributed throughout the target zone. The target is defined by its location on aplasmid downstream from a specific type of origin of replication (ColEl Ori), which differs fromthe origin found on the main chromosome, and also from the origin found on the plasmid vectorencoding the mutant Poll. The plasmid region affected by mutagenesis can be up to 4 kb long ormore (which was a surprising finding, since it was previously believed that replication by adifferent, more faithful polymerase, Pol III, should begin to take over from Pol I after 400-500 ntfrom the origin). Evidently, neither the host chromosome nor the plasmid with the non-ColE!origin suffers an enhanced rate ofmutation, because Pol I is not involved in the replicationprocess for either of hese DNA molecules.

    15 M. Camps et aI. Targeted gene evolution in Escherichia coli using a highly error-prone DNA polymerase LProc. Nat!. Acad. Sci, USA. 100:9727-9732 (2003)

    35

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    38/59

    A molecular clock based on this approach would work roughly as follows. The gene forthe mutant pol I would be introduced into the species of interest, perhaps on a plasmid vector.An engineered, non-coding target sequence DNA would be introduced on a small, selfreplicating plasmid containing the ColE! origin of replication. In addition, the target sequencecould be flanked by convenient PCR primer regions, just as barcode sequences are, to facilitatetheir subsequent amplification and interrogation.

    , IJAsoi" 2d03 I' I

    II l' II "

    An internal clock for tracking generations.............................._....

    ,------------- , Use a mutant polymerase with ahigh intrinsic error rate (-10"3/ bp) This polymerase uses a -1 kb targetregion for accelerated mutagenesis Place a designed, non-codingsequence in the target region Interrogate by PCR amplification anddigestion with several restrictionenzymes

    '--___ _=_I;_JS_2_0_0___ - - ' Confirm by sequencinglawrence loeb lab. U. WashingtonCamps et at.. PNAS. in press

    DNA88/Codes&Wllfennlllks

    Barcode and Watermark Considerations

    The accumulated number ofmutations is proportional to thenumber of generationsUnclassified 41

    . DNA barcodes and watermarks offer a unique, biotechnology-based approach forfollowing the growth, dispersal, transport, and the ecological relationships oftagged bacterialspecies. Clearly, an ability to track and monitor species using molecular genetics will not solvethe problem ofbioremediation in any way. However, barco ding should be viewed as animportant a4iunct to this process. Regardless ofwhether organisms used for bioremediationpurposes are naturally-occurring or genetically-enhanced, and regardless ofwhether suchorganisms are confined tightly within holding areas or allowed to roam freely in theenvironment, it will be vital to follow their progress. Unexpected things can, and do, happen.Examples include unwanted releases of organisms into certain environments, unanticipatedpopulation spikes or crashes, the development ofnew mutations, lateral gene transfer, undesiredeffects of selection pressures, emergence ofnew variants and species, new symbioses, and a hostof other possibilities. It therefore seems prudent to develop a means of closely following theprogress ofbioremediation at the level ofthe very organisms involved.

    Barcodes not only offer a powerful new way to track the ecology and species diffusion ofmicroorganisms, but also introduce a useful level of ransparency into the bioremediation processitself. This is desirable on many grounds, including for regulatory compliance and from a public

    36

  • 8/9/2019 JASON - DNA Barcodes and Watermarks

    39/59

    relations perspective. Barcodes can supply data that would next-to-impossible to obtain byalternative means. Finally, the combined use ofbarcodes with watermarks could potentiallyoffer a level oflegal protection for the DOE, for example, against false accusations thatmicroorganisms used for bioremediation purposes had somehow escaped and contaminated otherenvironments (since the origins of any tagged organisms could be unambiguously traced).

    I:[_ Pros Political Considerations! - Supplies 'species labels' for scientific trackingir > Growth, dispersal, transport, niches, blooms/die-outs . ~ . i: > Ni>nitor rates