Download - Targeted Genome Editing of Bacteria Within Microbial ... · 17/07/2020 · 56 editing within a community of wild-type microbes has not yet been reported12. 57 Here we show that specific

1

Targeted Genome Editing of Bacteria Within Microbial 1

Communities 2

3

Benjamin E. Rubin1,2,13, Spencer Diamond1,3,13, Brady F. Cress1,2,13, Alexander Crits-Christoph4, 4

Christine He1,2,3, Michael Xu1,2, Zeyi Zhou1,2, Dylan C. Smock1,2, Kimberly Tang1,2, Trenton K. 5

Owens5, Netravathi Krishnappa1, Rohan Sachdeva1,3, Adam M. Deutschbauer4,5, Jillian F. 6

Banfield1,3,6,7* & Jennifer A. Doudna1,2,8-12* 7

1Innovative Genomics Institute, University of California, Berkeley, CA, USA. 2Department of Molecular and 8

Cell Biology, University of California, Berkeley, CA, USA. 3Department of Earth and Planetary Science, 9

University of California, Berkeley, CA, USA. 4Department of Plant and Microbial Biology, University of 10

California, Berkeley, CA, USA. 5Environmental Genomics and Systems Biology Division, Lawrence 11

Berkeley National Laboratory, Berkeley, California, USA. 6Environmental Science, Policy and Management, 12

University of California, Berkeley, CA, USA. 7School of Earth Sciences, University of Melbourne, 13

Melbourne, Victoria, Australia. 8California Institute for Quantitative Biosciences, University of California, 14

Berkeley, CA, USA. 9Department of Chemistry, University of California, Berkeley, CA, USA. 10Howard 15

Hughes Medical Institute, University of California, Berkeley, CA, USA. 11Molecular Biophysics & Integrated 16

Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA 12Gladstone Institutes, 17

University of California, San Francisco, CA, USA. 13These authors contributed equally to this work: 18

Benjamin E. Rubin, Spencer Diamond, Brady F. Cress. 19

20

*e-mail: [email protected]; [email protected] 21

.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint

https://doi.org/10.1101/2020.07.17.209189

http://creativecommons.org/licenses/by-nc/4.0/

2

Knowledge of microbial gene functions comes from manipulating the DNA of individual 22

species in isolation from their natural communities. While this approach to microbial 23

genetics has been foundational, its requirement for culturable microorganisms has left the 24

majority of microbes and their interactions genetically unexplored. Here we describe a 25

generalizable methodology for editing the genomes of specific organisms within a 26

complex microbial community. First, we identified genetically tractable bacteria within a 27

community using a new approach, Environmental Transformation Sequencing (ET-Seq), 28

in which non-targeted transposon integrations were mapped and quantified following 29

community delivery. ET-Seq was repeated with multiple delivery strategies for both a nine-30

member synthetic bacterial community and a ~200-member microbial bioremediation 31

community. We achieved insertions in 10 species not previously isolated and identified 32

natural competence for foreign DNA integration that depends on the presence of the 33

community. Second, we developed and used DNA-editing All-in-one RNA-guided CRISPR-34

Cas Transposase (DART) systems for targeted DNA insertion into organisms identified as 35

tractable by ET-Seq, enabling organism- and locus-specific genetic manipulation within 36

the community context. These results demonstrate a strategy for targeted genome editing 37

of specific organisms within microbial communities, establishing a new paradigm for 38

microbial manipulation relevant to research and applications in human, environmental, 39

and industrial microbiomes. 40

41

Genetic mutation and observation of phenotypic outcomes are the primary means of deciphering 42

gene function in microorganisms. This classical genetic approach requires manipulation of 43

isolated species, limiting knowledge in three fundamental ways. First, the vast majority of 44

microorganisms have not been isolated in the laboratory and are thus largely untouched by 45

molecular genetics1. Second, emergent properties of microbial communities that may arise due 46

to interactions between their constituents, remain mostly unexplored2. Third, microorganisms 47



https://doi.org/10.1101/2020.07.17.209189


3

grown and studied in isolation quickly adapt to their new lab environment, obscuring their true 48

“wild type” physiology3. Since most microorganisms relevant to the environment, industry and 49

health live in communities, approaches for precision genome modification (editing) in community 50

contexts will be transformative. 51

Advances toward genome editing within microbial communities have included assessing 52

gene transfer to microbiomes using selectable markers4–9, microbiome manipulation leveraging 53

pre-modified isolates10, and use of temperate phage for species-specific integration of genetic 54

payloads11. However, a generalizable strategy for programmable organism- and locus-specific 55

editing within a community of wild-type microbes has not yet been reported12. 56

Here we show that specific organisms within microbial communities can be targeted for 57

site-specific genome editing, enabling manipulation of species without requiring prior isolation or 58

engineering. Using a new method developed for this study, Environmental Transformation 59

Sequencing (ET-Seq), we identified genetically accessible species within a nine-member 60

synthetic community and among previously non-isolated species in a 197-member bioremediation 61

community. These results enabled targeted genome editing of microbes in the nine-member 62

community using DNA-editing All-in-one RNA-guided CRISPR-Cas Transposase (DART) 63

systems developed here. The resulting species-specific editing provides the first broadly 64

applicable strategy for organism- and locus-specific genetic manipulation within a microbial 65

community, hinting at new emergent properties of member organisms and methods for controlling 66

microorganisms within their native environments. 67

68

ET-Seq detects genetically accessible microbial community members 69

Editing organisms within a complex microbiome requires knowing which constituents are 70

accessible to nucleic acid delivery and editing. We developed ET-Seq to assess the ability of 71

individual species within a microbial community to acquire and integrate exogenous DNA (Fig. 72

1a). In ET-Seq, a microbial community is exposed to a randomly integrating mobile genetic 73



https://doi.org/10.1101/2020.07.17.209189


4

element (here, a mariner transposon), and in the absence of any selection, total community DNA 74

is then extracted and sequenced using two protocols. In the first, we enrich and sequence the 75

junctions between the inserted and host DNA to determine insertion location and quantity in each 76

host. This step requires comparison of the junctions to previously sequenced community 77

reference genomes. In the second, we conduct low-depth metagenomic sequencing to quantify 78

the abundance of each community member in a sample (Extended Data Fig. 1a). Together these 79

sequencing procedures provide relative insertion efficiencies for microbiome members. To 80

normalize these data according to a known internal standard, we add to every sample a uniform 81

amount of genomic DNA from an organism that has previously been transformed with, and 82

selected for, a mariner transposon. In the internal standard, we expect every genome to contain 83

an insertion. The final output of ET-Seq is a fractional number representing the proportion of a 84

target organism’s population that harbored transposon insertions at the time DNA was extracted, 85

or insertion efficiency (Extended Data Fig. 1b). To facilitate the analysis of these disparate data, 86

we developed a complete bioinformatic pipeline for quantifying insertions and normalizing results 87

according to both the internal control and metagenomic abundance 88

(https://github.com/SDmetagenomics/ETsuite and Methods). Together the experimental and 89

bioinformatic approaches of ET-Seq reveal species-specific genetic accessibility by measuring 90

the percentage of each member of a given microbiome that acquires a transposon insertion. 91

92



https://doi.org/10.1101/2020.07.17.209189


5

93

Fig. 1 | ET-Seq for quantitative measurement of insertion efficiency in a microbial community. a, ET-94

Seq provides data on insertion efficiency of multiple delivery approaches, including conjugation, 95

electroporation, and natural DNA transformation, on microbial community members. In this illustrative 96

example, the blue strain is most amenable to electroporation (star). This data allows for the determination 97

of feasible targets and delivery methods for DART targeted editing. b, ET-Seq determined efficiencies for 98

known quantities of spiked-in pre-edited K. michiganensis (Klebsiella_1). Data shown is the mean of three 99

technical replicates. LOD is the lowest insertion fraction at which accurate detection of insertions is 100

expected and LOQ is the lower limit at which this fraction is expected to be quantifiable. Solid line is the fit 101

of the linear regression to the data not including zero (n = 4 independent samples) that is used to calculate 102



https://doi.org/10.1101/2020.07.17.209189


6

LOD and LOQ (slope = 0.2137; intercept = 1.813*10-6). c-d, ET-Seq determined insertion efficiencies in the 103

nine-member consortium (n = 3 biological replicates) with conjugative delivery shown as c, a portion of the 104

entire community and d, a portion of each species. Control samples received no exogenous DNA. Average 105

relative abundances of community constituents across conjugation samples (n = 6 independent samples) 106

are indicated in parentheses. LOD and LOQ are indicated in plots by short and long dashed lines 107

respectively. 108

109

ET-Seq was developed and tested on a nine-member microbial consortium made up of 110

bacteria from three phyla that are often detected and play important metabolic roles within soil 111

microbial communities (Supplementary Table 1). We initially endeavored to test the accuracy and 112

detection limit by adding to the nine-member community a known amount of a previously prepared 113

mariner transposon library of one of its member species, Klebsiella michiganensis M5a1 114

(Klebsiella_1). The ET-Seq derived insertion efficiencies were closely correlated to the known 115

fractions of edited K. michiganensis present in each sample (Fig. 1b). Using this data we 116

calculated a limit of detection (LOD) and limit of quantification (LOQ) for our approach (Methods). 117

The LOD suggests that a fraction of ≥ 8.4*10-6 of transformed cells out of the total community 118

would be detectable by ET-Seq. 119

Next, the mariner transposon vector was delivered to the nine-member community through 120

conjugation. We could measure conjugation reproducibly and quantitatively in the three species 121

that grew to make up over 99% of the community (Fig. 1c). We further normalized insertion 122

efficiency in each species according to its abundance so that their insertion efficiencies represent 123

insertion containing cells as a portion of total cells for each species (Fig. 1d). Even for 124

Paraburkholderia caledonica, which made up ~0.1% of the community, we could measure 125

insertions. We detected no insertions in the remaining community members, which was expected 126

given their extreme rarity in the community (less than 0.05%). 127



https://doi.org/10.1101/2020.07.17.209189


7

We next used ET-Seq to compare insertion efficiencies in the nine-member community 128

after transposon delivery by conjugation, natural transformation with no induction of competence, 129

or electroporation of the transposon vector. Together these approaches showed reproducible 130

insertion efficiencies above the LOD in five of the nine community members (Fig. 2a and Extended 131

Data Fig. 2). Additionally we could identify preferred delivery methods for some members in this 132

community context, such as electroporation being consistently reproducible for Dyella japonica 133

UNC79MFTsu3.2 while conjugation was not. These results show that ET-Seq can identify and 134

quantify genetic manipulation of microbial community members and reveal suitable DNA delivery 135

methods for each. 136

137



https://doi.org/10.1101/2020.07.17.209189


8

138

Fig. 2 | ET-Seq detection of insertion efficiency across multiple delivery approaches. a, ET-Seq 139

determined insertion efficiencies for conjugation, electroporation, and natural transformation on the nine-140

member consortium (n = 3 biological replicates). Only members with at least one positive insertion efficiency 141

value across the delivery methods are shown. Average relative abundance across all samples (n = 18 142

independent samples) is indicated in parentheses. b, Comparing delivery strategies across data from all 143

organisms. Cross bars indicate the mean value and whiskers denote the 95% confidence interval for the 144

mean (Conjugation n = 9; Electroporation n = 14; Natural Transformation n = 9). c, Comparison of insertion 145



https://doi.org/10.1101/2020.07.17.209189


9

efficiencies measured for natural transformation of K. michiganensis (Klebsiella_1) in isolated culture (n = 146

3 biological replicates) compared to K. michiganensis grown in the community context (n = 3 biological 147

replicates). 148

149

Notably, five organisms exhibited some degree of natural competency, although average 150

efficiencies were significantly lower for natural transformation than for delivery through 151

electroporation (ANOVA followed by Tukey's HSD; two-sided test; p = 0.0009) (Fig. 2b). To the 152

best of our knowledge, no isolates of the Klebsiella genus including K. michiganensis are known 153

to be naturally competent. We conducted a second experiment to compare the insertion efficiency 154

of K. michiganensis cultivated in isolation versus grown in the community context. ET-Seq 155

returned no values above the LOD for natural transformation of K. michiganensis in isolation, but 156

within the community ET-Seq returned values well into the quantifiable range (Fig. 2c). This 157

apparent emergent property of natural competence within a small synthetic community provides 158

tantalizing support for the possibility of community induced natural transformation, an idea suggested 159

in previous work, but experimentally unstudied due to lack of tools for measuring horizontal gene 160

transfer events within a community13,14. 161

162

Genetic accessibility of uncultivated species within an environmental microbiome 163

To test the potential for editing in a complex and environmentally realistic community that has not 164

been reduced to isolates, we conducted ET-Seq on a genomically characterized 197 member 165

bioreactor-derived consortium that degrades thiocyanate (SCN-), a toxic byproduct of gold 166

processing15. We sampled biofilm from the reactor and conducted ET-Seq with a panel of delivery 167

techniques: conjugation, electroporation, and natural transformation. Across ET-Seq replicates at 168

least one measurement above the LOD was identified for 15 members of the bioreactor 169

community. We also note that the transformed organisms make up ~87% of the bacterial fraction 170

by relative abundance (Fig. 3a, Extended Data Fig. 3). Ten of these organisms were species that 171



https://doi.org/10.1101/2020.07.17.209189


10

had not previously been isolated or edited (Supplementary Table 1), and overall members from 7 172

of the 12 phyla detected in this consortium were successfully transformed (Fig. 3b). This included 173

an Afipia sp. known to play an important role in the thiocyanate degradation process. Additionally, 174

one of the transposon recipients, Microbacterium ginsengisoli (Microbacterium_3), is a putative 175

host for Saccharibacteria, a candidate phyla radiation (CPR) organism that has been observed in 176

this reactor system16. Notably, members of the CPR are resistant to typical isolation techniques 177

due to heavy dependence on other community members, and little is known about the nature of 178

their likely symbiotic relationships with other organisms17. Here, ET-Seq has uncovered a 179

genetically tractable putative host organism, raising the possibility of genetically editing the host 180

to probe CPR/host symbiotic relationships within a complex microbial community. In this way, ET-181

Seq reveals genetic accessibility and the tools necessary to achieve it in previously 182

unapproachable and biologically important members of an environmentally relevant community. 183

184

Fig. 3 | ET-Seq detection of insertion efficiency in thiocyanate-degrading bioreactor. a, ET-Seq 185



https://doi.org/10.1101/2020.07.17.209189


11

determined insertion efficiencies for conjugation, electroporation, and natural transformation on the 186

thiocyanate-degrading bioreactor community (n = 3 biological replicates). Average relative abundance 187

across all samples is indicated in parentheses (n = 17 independent samples). b, A ribosomal protein S3 188

(rpS3) phylogenetic tree of all organisms in the thiocyanate-degrading bioreactor genome database. Only 189

community members receiving at least one insertion by conjugation, electroporation, or natural 190

transformation detectable above LOD are indicated by filled circles. Filled circles indicate success of 191

method and open circles indicate method was not detected. A star indicates genomically encoded SCN- 192

degradation capacity in organisms shown to be edited by at least one method. Tree was constructed from 193

an alignment of 262 rps3 protein sequences using IQ-TREE. 194

195

Targeted genome editing in microbial communities using CRISPR-Cas transposases 196

The ability to introduce genome edits to a single type of organism in a microbial community and 197

to target those edits to a defined location within its genome would be a foundational advance in 198

microbiological research and would have many useful applications. We reasoned that RNA-199

guided CRISPR-Cas Tn7 transposases could provide the ability to both ablate function of targeted 200

genes and deliver customized genetic cargo in organisms shown to be genetically tractable by 201

ET-Seq18–20 (Fig. 1a). However, the two-plasmid ShCasTn18 and three-plasmid VcCasTn19 202

systems are not amenable to efficient delivery within complex microbial communities or even 203

beyond E. coli due to their multiple plasmids. Since ET-Seq identified conjugation and 204

electroporation as broadly effective delivery approaches in the tested communities, we designed 205

and constructed all-in-one conjugative versions of these CasTn vectors that could be used for 206

delivery by either strategy (Fig. 4a and Methods). These DART systems are barcoded and 207

compatible with the same sequencing methods used for ET-Seq, and can be used to assay the 208

efficacy of CRISPR-Cas-guided transposition into the genome of a target organism. 209

210



https://doi.org/10.1101/2020.07.17.209189


12

211

Fig. 4 | Benchmarking all-in-one conjugal targeted vectors. a, Schematic of VcDART and ShDART 212

delivery vectors. b, Fraction of insertions that occur in a 60 bp window around the target site. Mean for 213

three independent biological replicates is shown as cross bars. c-d, Aggregate unique insertion counts (n 214

= 3 biological replicates) across the E. coli BL21(DE3) genome, determined by presence of unique barcodes, 215

using c, VcDART and d, ShDART. The inset shows a 60 bp window downstream of the target site. Insertion 216

distance downstream of the target site is calculated from the 3’ end of the protospacer. 217



https://doi.org/10.1101/2020.07.17.209189


13

218

We compared the transposition efficiency and specificity of the DART systems in E. coli 219

in order to select the most promising candidate for targeted genome editing in microbial 220

communities. VcDART and ShDART systems harboring GmR cargo with a lacZ-targeting or non-221

targeting guide RNA were conjugated into E. coli to quantify transposition efficiency, and target 222

site specificity was assayed using ET-Seq following outgrowth of transconjugants in selective 223

medium (Methods and Extended Data Fig. 4a). While ShDART yielded approximately tenfold 224

more colonies possessing insertions than ShDART (Extended Data Fig. 4), >92% of the 225

selectable colonies obtained using ShDART were off-target, compared to no detectable off-target 226

insertions for VcDART (Fig. 4b-d). Due to VcDART’s high target site specificity and the 227

undesirable propensity for ShCasTn to co-integrate its donor plasmid21,22, we focused on VcDART 228

to test the potential for targeted microbial community genome editing. 229

230

Targeted microbial community editing by programmable transposition 231

We reasoned that RNA-programmed transposition could be deployed for targeted editing of 232

specific types of organisms within a microbial consortium. ET-Seq had shown two species within 233

the nine-member community, K. michiganesis and Pseudomonas simiae WCS417, to be both 234

abundant and tractable by conjugation (Fig. 1d). We targeted both of these organisms using 235

conjugation to introduce the VcDART vector into the community with guide RNAs specific to their 236

genomes (Fig. 5a). Insertions were designed to produce loss-of-function mutations in the K. 237

michiganesis and P. simiae pyrF gene, an endogenous counterselectable marker allowing growth 238

in the presence of 5-fluoroorotic acid (5-FOA) when disrupted. The transposons carried two 239

antibiotic resistance markers conferring resistance to streptomycin and spectinomycin (aadA) and 240

carbenicillin (bla). Together the simultaneous loss-of-function and gain-of-function mutations 241

allowed for a strong selective regime. VcDART targeted to K. michiganensis or to P. simiae pyrF 242

followed by selection led to enrichment of these organisms to ~98% and ~97% pure culture 243



https://doi.org/10.1101/2020.07.17.209189


14

respectively (Fig. 5b). No outgrowth was detected when using a guide RNA that did not target 244

these respective microbial genomes. Recovered transformant colonies of K. michiganensis and 245

P. simiae analyzed by PCR and Sanger sequencing showed full length, pyrF-disrupting VcDART 246

transposon insertions 48-49 bp downstream of the guide RNA target site, consistent with 247

CRISPR-Cas transposase-catalyzed transposition events at the desired genomic location (Fig. 248

4c and 5c). These results demonstrate that targeted genome editing using DART enables genetic 249

manipulation of distinct members of a complex microbial community. This targeted editing of 250

microorganisms in a community context can also enable subsequent exploitation of modified 251

phenotypes. 252

253

254

Fig. 5 | Targeted editing in the nine-member consortium. a, Conjugative VcDART delivery into a 255

microbial community using species-specific crRNA, followed by selection for transposon cargo, facilitates 256

selective enrichment of targeted organisms. b, Relative abundance of nine-member community 257



https://doi.org/10.1101/2020.07.17.209189


15

constituents measured by 16S rRNA sequencing before conjugative VcDART delivery and after selection 258

for pyrF-targeted transposition in K. michiganensis or P. simiae (n = 3 biological replicates). * indicates no 259

growth detected in selective medium. c, Representative Sanger sequencing chromatogram of PCR product 260

spanning transposon insertion site at targeted pyrF locus in K. michiganensis (top) and P. simiae (bottom) 261

colonies following VcDART-mediated transposon integration and selection. Target-site duplications (TSD) 262

are indicated with dashed boxes. 263

264

Discussion 265

We have demonstrated organism- and locus-specific genome editing within a microbial 266

community, providing a new approach to microbial genetics and microbiome manipulation for 267

research and applications. ET-Seq revealed the genetic accessibility of organisms growing within 268

microbial communities, including ten microbial species that had not been previously isolated or 269

found to be genetically manipulated. The creation of all-in-one vectors encoding two naturally 270

occurring CRISPR-Cas transposon systems enabled comparison of their targeted genome editing 271

capabilities. These experiments showed that only one of the two systems, which we termed 272

VcDART, enabled precise RNA-programmable microbial genome editing. The ability to conduct 273

targeted genome editing of two bacteria within a nine-member synthetic community and to use 274

the introduced genetic changes as a means of organism isolation demonstrates a new approach 275

to microbiome manipulation. Traditionally, the combined steps of culturing an environmental 276

microbe, determining the ideal means to transform it, and implementing targeted editing could 277

take years or could fail altogether23. ET-Seq combined with VcDART compresses the pipeline for 278

establishing genetics in microorganisms to weeks and expands the diversity of organisms that 279

can be targeted beyond those that can be cultivated in isolation. 280

In addition to providing a roadmap for targeted microbial genome editing, ET-Seq can be 281

used to discover and analyze horizontal gene transfer in complex communities. In this study, we 282

observed unexpected community-dependent natural transformation in the nine-member 283



https://doi.org/10.1101/2020.07.17.209189


16

community and characterized the horizontal gene transfer events experimentally in the complex 284

microbiome of a thiocyanate-degrading bioreactor. In future experiments, multiple ET-Seq time 285

points could be taken in a community after delivery to measure directly both the portion of the 286

community receiving DNA and the persistence of gene transfer, rather than tracking horizontal 287

gene transfer using bioinformatics24 or indirect experimental methods4–9. Furthermore, as ET-Seq 288

is applied in increasingly diverse and complex environments, an atlas of editable taxa can be 289

created including optimal delivery approaches. To expand this dataset, we plan to apply ET-Seq 290

to new microbial communities being sampled for metagenomic sequencing, a natural pairing 291

because ET-Seq depends on availability of genome sequences for the component organisms. 292

In the future, tools are needed to create more generally applicable and persistent targeted 293

genome edits. The canonical approach involving antibiotic selection for edited bacteria is 294

infeasible in large complex communities, where many natural resistances exist25,26. Even in the 295

nine-member synthetic community used in this study, three antibiotics and counterselection were 296

necessary to achieve strict selection. Improved delivery strategies and alternative positive and 297

counter selection methods should enable more efficient editing. In the gut microbiome, porphyran 298

has been used successfully for selection of a spiked-in organism capable of utilizing this 299

compound27. Such approaches for more efficient microbial community editing will enable research 300

to answer fundamental questions as well as allow manipulation of agricultural, industrial, and 301

health-relevant microbiomes. The combination of ET-Seq and DART systems presented here 302

provide the foundation of the new field of in situ microbial genetics. 303

304

References 305

1. Steen, A. D. et al. High proportions of bacteria and archaea across most biomes remain 306

uncultured. ISME J. (2019) doi:10.1038/s41396-019-0484-y. 307

2. Pascual-García, A., Bonhoeffer, S. & Bell, T. Metabolically cohesive microbial consortia and 308



https://doi.org/10.1101/2020.07.17.209189


17

ecosystem functioning. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190245 (2020). 309

3. Fux, C. A., Shirtliff, M., Stoodley, P. & Costerton, J. W. Can laboratory reference strains 310

mirror ‘real-world’ pathogenesis? Trends Microbiol. 13, 58–63 (2005). 311

4. Pukall, R., Tschäpe, H. & Smalla, K. Monitoring the spread of broad host and narrow host 312

range plasmids in soil microcosms. FEMS Microbiol. Ecol. 20, 53–66 (1996). 313

5. De Gelder, L., Vandecasteele, F. P. J., Brown, C. J., Forney, L. J. & Top, E. M. Plasmid 314

Donor Affects Host Range of Promiscuous IncP-1β Plasmid pB10 in an Activated-Sludge 315

Microbial Community. Appl. Environ. Microbiol. 71, 5309–5317 (2005). 316

6. Musovic, S., Oregaard, G., Kroer, N. & Sørensen, S. J. Cultivation-Independent 317

Examination of Horizontal Transfer and Host Range of an IncP-1 Plasmid among Gram-318

Positive and Gram-Negative Bacteria Indigenous to the Barley Rhizosphere. Applied and 319

Environmental Microbiology vol. 72 6687–6692 (2006). 320

7. Musovic, S., Klümper, U., Dechesne, A., Magid, J. & Smets, B. F. Long-term manure 321

exposure increases soil bacterial community potential for plasmid uptake. Environ. 322

Microbiol. Rep. 6, 125–130 (2014). 323

8. Klümper, U. et al. Broad host range plasmids can invade an unexpectedly diverse fraction 324

of a soil bacterial community. ISME J. 9, 934–945 (2015). 325

9. Ronda, C., Chen, S. P., Cabral, V., Yaung, S. J. & Wang, H. H. Metagenomic engineering 326

of the mammalian gut microbiome in situ. Nat. Methods 16, 167–170 (2019). 327

10. Farzadfard, F., Gharaei, N., Citorik, R. J. & Lu, T. K. Efficient Retroelement-Mediated DNA 328

Writing in Bacteria. bioRxiv (2020). 329

11. Hsu, B. B., Way, J. C. & Silver, P. A. Stable Neutralization of a Virulence Factor in Bacteria 330

Using Temperate Phage in the Mammalian Gut. mSystems 5, (2020). 331

12. Sheth, R. U., Cabral, V., Chen, S. P. & Wang, H. H. Manipulating Bacterial Communities by 332

in situ Microbiome Engineering. Trends Genet. 32, 189–200 (2016). 333

13. Wang, X. et al. Across Genus Plasmid Transformation Between Bacillus subtilis and 334



https://doi.org/10.1101/2020.07.17.209189


18

Escherichia coli and the Effect of Escherichia coli on the Transforming Ability of Free 335

Plasmid DNA. Current Microbiology vol. 54 450–456 (2007). 336

14. Borgeaud, S., Metzger, L. C., Scrignari, T. & Blokesch, M. The type VI secretion system of 337

Vibrio cholerae fosters horizontal gene transfer. Science 347, 63–67 (2015). 338

15. Kantor, R. S. et al. Genome-Resolved Meta-Omics Ties Microbial Dynamics to Process 339

Performance in Biotechnology for Thiocyanate Degradation. Environ. Sci. Technol. 51, 340

2944–2953 (2017). 341

16. Huddy, R. J. et al. Thiocyanate and organic carbon inputs drive convergent selection for 342

specific autotrophic Afipia and Thiobacillus strains within complex microbiomes. bioRxiv 343

2020.04.29.067207 (2020) doi:10.1101/2020.04.29.067207. 344

17. Castelle, C. J. et al. Biosynthetic capacity, metabolic variety and unusual biology in the 345

CPR and DPANN radiations. Nat. Rev. Microbiol. 16, 629–645 (2018). 346

18. Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. 347

Science 365, 48–53 (2019). 348

19. Klompe, S. E., Vo, P. L. H., Halpin-Healy, T. S. & Sternberg, S. H. Transposon-encoded 349

CRISPR–Cas systems direct RNA-guided DNA integration. Nature 571, 219–225 (2019). 350

20. Petassi, M. T., Hsieh, S.-C. & Peters, J. E. Guide RNA categorization enables target site 351

choice in Tn7-CRISPR-Cas transposons. bioRxiv 2020.07.02.184150 (2020) 352

doi:10.1101/2020.07.02.184150. 353

21. Rice, P. A., Craig, N. L. & Dyda, F. Comment on ‘RNA-guided DNA insertion with CRISPR-354

associated transposases’. Science 368, (2020). 355

22. Strecker, J., Ladha, A., Makarova, K. S., Koonin, E. V. & Zhang, F. Response to Comment 356

on ‘RNA-guided DNA insertion with CRISPR-associated transposases’. Science 368, 357

(2020). 358

23. Laurenceau, R. et al. Toward a genetic system in the marine cyanobacterium 359

Prochlorococcus. Access Microbiology 6, 23 (2020). 360



https://doi.org/10.1101/2020.07.17.209189


19

24. Soucy, S. M., Huang, J. & Gogarten, J. P. Horizontal gene transfer: building the web of life. 361

Nat. Rev. Genet. 16, 472–482 (2015). 362

25. Hu, Y. et al. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of 363

human gut microbiota. Nat. Commun. 4, 2151 (2013). 364

26. Forsberg, K. J. et al. Bacterial phylogeny structures soil resistomes across habitats. Nature 365

509, 612–616 (2014). 366

27. Shepherd, E. S., DeLoache, W. C., Pruss, K. M., Whitaker, W. R. & Sonnenburg, J. L. An 367

exclusive metabolic niche enables strain engraftment in the gut microbiota. Nature 557, 368

434–438 (2018). 369

370



https://doi.org/10.1101/2020.07.17.209189


20

Methods 371

Plasmid construction and barcoding 372

For ET-Seq measurement of genetic tractability in community members, DNA encoding a non-373

targeted mariner transposon was delivered. The mariner transposon integrates into “TA” 374

sequences in recipient cells. For delivery of the mariner transposon, we made use of the 375

previously created pHLL250 vector, which contains an RP4 origin of transfer (oriT), AmpR, 376

conditional (pir+-dependent) R6K origin, and an AseI restriction site to facilitate depletion of vector 377

from DNA samples in ET-Seq library preparations1. Unique to each transposon on this vector is 378

a random 20 bp barcode sequence to aid in the discrimination of unique insertion events from 379

duplications of the same insertion due to cell division or PCR. 380

DART vectors were designed to encode all components required for delivery and editing 381

(Supplementary Table 2 and Extended Data Fig. 4). VcCasTn genes, crRNA, and Tn were 382

synthesized as gBlocks (IDT). pHelper_ShCAST_sgRNA was a gift from Feng Zhang (Addgene 383

plasmid #127921; http://n2t.net/addgene:127921; RRID:Addgene_127921) and was used to 384

clone ShCasTn genes and sgRNA. pDonor_ShCAST_kanR was a gift from Feng Zhang 385

(Addgene plasmid # 127924 ; http://n2t.net/addgene:127924 ; RRID:Addgene_127924) and was 386

used to clone the ShCasTn transposon. tns genes, cas genes, and crRNA/sgRNA were 387

consolidated into a single operon (with various promoters and transcriptional configurations) on 388

the same vector as the cognate transposon. The left end of the cognate Tn was encoded 389

downstream of the crRNA/sgRNA, followed by Tn cargo, barcode, and Tn right end. DART Tn LE 390

and RE were designed to include the minimal sequence that both included all putative TnsB 391

binding sites and was previously shown to be functional2,3. Specifically, VcDART LE (108 bp) and 392

RE (71 bp) each encompass three 20 bp putative TnsB binding sites, spanning from the edge of 393

the 8 bp terminal ends to the edge of the third putative TnsB binding site2. ShDART LE (113 bp) 394

spans the boundaries of the long terminal repeat and both additional putative TnsB binding sites, 395



https://doi.org/10.1101/2020.07.17.209189


21

while the RE (211 bp) encompasses the long terminal repeat and all four additional putative TnsB 396

binding sites3. 397

Vectors were cloned using BbsI (NEB) Golden Gate assembly of part plasmids, each 398

encoding different regions of the final plasmid. Of note, the backbone encodes RP4 oriT, AmpR, 399

conditional R6K origin, and an AsiSI+SbfI double digestion site for vector depletion during ET-400

Seq library preparations. A 2xBsaI spacer placeholder enabled spacer cloning with BsaI (NEB) 401

Golden Gate. A 2xBsmBI barcode placeholder was encoded immediately inside the Tn right end 402

and was used for barcoding as described below. Part plasmids were propagated in E. coli Mach1-403

T1R (QB3 Macro Lab). Golden Gate reactions for all-in-one vector assembly were purified with 404

DNA Clean & Concentrator-5 (Zymo Research) and electroporated into E. coli EC100D-pir+ 405

(Lucigen). 406

DART vectors were barcoded by BsmBI (NEB) Golden Gate insertion of random barcode 407

PCR product into the 2xBsmBI barcode placeholder using a previously reported method4 with 408

slight modifications. A 56-nt ssDNA oligonucleotide encoding a central tract of 20 degenerate 409

nucleotides (oBFC1397) was amplified with BsmBI-encoding primers oBFC1398 and oBFC1399 410

using Q5 High-Fidelity 2X Master Mix (NEB) in a six-cycle PCR (98°C for 1 min; six cycles of 98°C 411

for 10 s, 58°C for 30 s, and 72°C for 60 s; and 72°C for 5 min). Barcoding Golden Gate reactions 412

were purified with DNA Clean & Concentrator-5. To remove residual non-barcoded vector, 413

reactions were digested with 15 U BsmBI at 55°C for at least 4 hr, heat inactivated at 80°C for 20 414

min, treated with 10 U Plasmid-Safe ATP-Dependent DNase (Lucigen) exonuclease at 37°C for 415

1 hr, heat inactivated at 70°C for 30 min, and purified with DNA Clean & Concentrator-5. 416

Randomly barcoded conjugative vectors were electroporated into E. coli EC100D-pir+, 417

followed 1 hr recovery in 1 mL pre-warmed SOC (NEB) at 37°C 250 rpm, serial dilution and spot 418

plating on LB agar plus 100 μg mL-1 carbenicillin to estimate library diversity, and plating the full 419

transformation across 5 LB agar plates containing carbenicillin (and other appropriate antibiotics 420

when Tn cargo contained other resistance cassettes). To prepare barcoded conjugative vector 421



https://doi.org/10.1101/2020.07.17.209189


22

plasmid stock, all 5 agar plates were scraped into a single pool and midiprepped (Zymo 422

Research). All conjugations were performed using the diaminopimelic acid (DAP) auxotrophic 423

RP4 conjugal donor E. coli strain WM3064. Donor strains were prepared by electroporation with 424

200 ng barcoded vectors, followed by recovery in SOC plus DAP at 37°C and 250 rpm and 425

inoculation of the entire recovery culture into 15 mL LB containing DAP and carbenicillin in 50 mL 426

conical tubes, followed by overnight cultivation at 37°C and 250 rpm. Donor serial dilutions were 427

spot plated on LB agar plus carbenicillin to estimate final barcode diversity. 428

429

Guide RNA design 430

In all experiments, VcCasTn gRNAs used 32 nt spacers and a 5’-CC Type IF PAM, while 431

ShCasTn gRNAs used 23 nt spacers and a 5’-GTT Cas12k PAM. All gRNAs were designed to 432

bind in the first half of the target CDS to ensure functional knockout by transposon insertion 433

(Supplementary Table 3). Off-target potential was assessed using BLASTn (-dust no -word_size 434

4) of spacers against a local BLAST database created from all genomes present in an experiment, 435

and spacers were discarded if off-target hits with E-value < 15 were identified. gRNAs with less 436

seed region complementarity to off-targets were prioritized. Non-targeting gRNAs were designed 437

by scrambling the spacer until no significant matches were found. 438

439

Delivery methods 440

For natural transformation and electroporation, a culture of the community or isolate to be 441

transformed was subcultured at OD600 = 0.2 and grown to OD600 = 0.5. In the case of the 442

thiocyanate-degrading bioreactor in the absence of accurate OD measurements the culture was 443

outgrown for two hours. For natural transformation 200 ng of vector harboring the mariner 444

transposon (pHLL2501) for non-targeted insertion, or water for the negative control were added 445

to 4 mL of OD600 = 0.5 outgrowth. Cultures were incubated for 3 hours shaking at 250 rpm at 446



https://doi.org/10.1101/2020.07.17.209189


23

temperature appropriate for the isolate or community before being moved to the appropriate 447

downstream analysis. 448

For electroporation, 20 mL of the community or isolate at OD600 = 0.5 was put on ice, 449

centrifuged at 4,000g at 4°C for 10 minutes, and washed four times with 10 mL sterile ice-cold 450

Milli-Q H2O. After a final centrifugation the pellet was resuspended in 100 μL of 2 ng/μL vector 451

(pHLL250 or VcDART), or 100 μL of water as a negative control. This solution was then pipetted 452

into a 0.2 cm gap ice-cold cuvette and electroporated at 3 kV, 200Ω, and 25 μF. The cells were 453

immediately recovered into 10 mL of the community’s or isolate’s preferred medium and incubated 454

shaking for 3 hours before being moved to the appropriate downstream analysis. 455

E. coli strain WM3064 containing the mariner transposon (pHLL250) for non-targeted 456

editing, or the VcDART for targeted editing was cultured overnight in LB supplemented with 457

carbenicillin (100 μg/mL) and DAP (60 μg/mL) at 37°C. Before conjugation the donor strain was 458

washed twice in LB (centrifugation at 4,000g for 10 minutes) to remove antibiotics. Then, 1 459

OD600*mL of the donor was added to 1 OD600*mL of the recipient community or isolate and the 460

mixture was plated on a 0.45 μm mixed cellulose ester membrane (Millipore) topping a plate of 461

the recipient’s preferred media without DAP. In the case of the thiocyanate-degrading bioreactor, 462

~2 OD600*mL of the donor was added to 2 OD600*mL of the recipient community to ensure 463

sufficient material despite the community's slow growth. Plates were incubated at the ideal 464

temperature for the recipient community or isolate for 12 hours before the growth was scraped off 465

the filter into the media of the recipient community or isolate for downstream analysis. 466

467

ET-Seq library preparation 468

The insertion junction sequencing library prep strategy for ET-Seq can be used (modification may 469

be necessary) in any circumstance where high efficiency mapping of inserted DNA to a host loci 470

is desired. For our purposes, DNA of the edited community or isolate was first extracted using 471



https://doi.org/10.1101/2020.07.17.209189


24

the DNeasy PowerSoil Kit (QIAGEN). In the case of the nine-member community, 500 ng of DNA 472

was used for both insertion junction sequencing and metagenomic library prep. For the SCN 473

community, which had lower yields of DNA, 100 ng were used. As an internal standard, DNA 474

from a previously constructed mutant library of Bacteroides thetaiotaomicron VPI-54825, a 475

species not present in the nine-member community or the thiocyanate-degrading bioreactor, was 476

spiked into the community DNA at a ratio of 1/500 by mass. The B. thetaiotaomicron library had 477

undergone antibiotic selection for its transposon insertions and was thus assumed to represent 478

100% transformation efficiency (i.e. every genome contained at least one mariner transposon 479

insertion). 480

For metagenomic sequencing, library prep was conducted by the standard ≥100 ng 481

protocol from the NEBNext Ultra II FS DNA Library Prep Kit for Illumina (NEB). For insertion 482

junction sequencing, the same protocol was used with a number of modifications enumerated 483

here (Extended Data Fig. 1). This insertion junction sequencing protocol has also been tested 484

successfully with the ≤ 100 ng protocol of the NEBNext Ultra II FS DNA Library Prep Kit (NEB) 485

and the KAPA HyperPlus Kit (Roche). For fragmentation an 8 minute incubation was used. A 486

custom splinklerette adaptor was used during adaptor ligation to decrease non-specific 487

amplification (Supplementary Table 4)6,7. For size selection 0.15X (by volume) SPRIselect 488

(Beckman Coulter, Cat # B23318) or NEBNext Sample Purification Beads (NEB) were used for 489

the first bead selection and 0.15X (by volume) were added for the second. From this selection, 490

the DNA was eluted in 44 μL (instead of the suggested 15 μL) where it undergoes digestion before 491

enrichment to cleave intact transposon delivery vector. All bead elutions were performed with 492

Sigma Nuclease-Free water. pHLL250 underwent AseI digestion, while DART vectors underwent 493

double digestion by AsiSI and SbfI-HF (NEB) (Supplementary Table 2). The DNA then underwent 494

a sample purification using 1X AMPure XP beads (Beckman Coulter) to prepare it for PCR 495

enrichment. 496



https://doi.org/10.1101/2020.07.17.209189


25

In PCR enrichment, the transposon junction was amplified by nested PCR. The PCRs 497

followed the NEBNext Ultra II FS DNA Library Prep Kit for Illumina (NEB) PCR protocol, however 498

in the first PCR the primers were custom to the transposon and the adaptor and the PCR was run 499

for 25 cycles (Supplementary Table 4). The enrichment then underwent sample purification with 500

a 0.7X size selection using SPRIselect or NEBNextSample Purification Beads from which 15 μL 501

were eluted for the second PCR. This second PCR used custom unique dual indexing primers 502

specific to nested regions of the insertion and adaptor and 6 cycles are used (Supplementary 503

Table 4). Then another 0.7x size selection was conducted and the final library was eluted in 30 504

μL. Samples for metagenomic sequencing and insertion junction sequencing were then quality 505

controlled and multiplexed using 1X HS dsDNA Qubit (Thermo Fisher) for total sample 506

quantification, Bioanalyzer DNA 12000 chip (Agilent) for sizing, and qPCR (KAPA) for 507

quantification of sequenceable fragments. Samples were sequenced on the iSeq100 or 508

HiSeq4000 platforms. 509

510

Genome sequencing, assembly, taxonomic classification, and database construction 511

For a full list of genome sequences used as read mapping references in this study see 512

Supplementary Table 1. Assembly and annotation of genomes used as references for the SCN 513

bioreactor experiment is described in Huddy et al.8. As the SCN bioreactor has been subjected to 514

numerous genome-resolved metagenomic studies8,9 we endeavored to create a non-redundant 515

database that contained all genomes previously observed in this reactor system. A set of 556 516

genomes assembled from this system were de-replicated at the species level using dRep v2.5.310 517

with an average nucleotide identity (ANI) threshold of 95% and a minimum completeness of 60% 518

as estimated by checkM v1.1.211. A single genome representing each species level group was 519

chosen by dRep based on optimizing genome size, fragmentation, estimated completeness, and 520

estimated contamination resulting in 265 representative genomes. Genomes were taxonomically 521

classified using GTDB-Tk12 with default options. Additionally, to display the taxonomic diversity of 522



https://doi.org/10.1101/2020.07.17.209189


26

transformed organisms (Fig. 3b), a phylogenetic tree was constructed using ribosomal protein S3 523

(rpS3). Briefly, a custom Hidden Markov Model (HMM) was used to identify rpS3 sequences13 in 524

the 265 representative genomes, and 262 rpS3 sequences were successfully identified. 525

Sequences were aligned using muscle14, and a maximum likelihood phylogenetic tree was 526

constructed using IQ-TREE15. Phylogenetic trees were pruned and annotated using iTol v5 527

(https://itol.embl.de/). To determine if an organism we identified as receiving exogenous DNA was 528

ever previously isolated we used the ANI relative to the closest reference genome in the GTDB-529

Tk output. If one of our genomes had an ANI ≥ 95% relative to a known reference, and this 530

reference genome was generated from an isolated bacterium, our target organism was 531

considered to be previously isolated (Supplementary Table 1). 532

For 9-member community genomes assembled as part of this study, cultures were grown 533

on R2A medium for 24 hours at 30°C and genomic DNA was extracted with the DNeasy Blood 534

and Tissue DNA Kit (Qiagen) with pre-treatment for Gram-positive bacteria. Genomic DNA was 535

sheared mechanically with the Covaris S220 and processed with the NEBNext DNA Library Prep 536

Master Mix Set for Illumina (NEB) before submitting for sequencing on an Illumina MiSeq platform 537

generating paired end 150 bp reads. Raw sequencing reads were processed to remove Illumina 538

adapter and phiX sequence using BBduk with default parameters, and quality trimmed at 3’ ends 539

with Sickle using default parameters (https://github.com/najoshi/sickle). Assemblies were 540

conducted using IDBA-UD v1.1.116 with the following parameters: –pre_correction –mink 30 –541

maxk 140 –step 10. Following assembly, contigs smaller than 1 kbp were removed and open 542

reading frames (ORFs) were then predicted on all contigs using Prodigal v2.6.317. 16S ribosomal 543

rRNA genes were predicted using the 16SfromHMM.py script from the ctbBio python package 544

using default parameters (https://github.com/christophertbrown/bioscripts). Transfer RNAs were 545

predicted using tRNAscan-SE18. The full metagenome samples and their annotations were then 546

uploaded into our in-house analysis platform, ggKbase, where genomes were manually curated 547



https://doi.org/10.1101/2020.07.17.209189


27

via the removal of contaminating contigs based on aberrant phylogenetic signatures 548

(https://ggkbase.berkeley.edu). 549

For each ET-Seq experiment a genomic database is constructed using the ETdb 550

component of the ETsuite software package. Each database contains the nucleotide sequences 551

of the expected organisms in a sample, any vectors used, any conjugal donor, and the spike in 552

control organism. Briefly, all genomic sequences are formatted into a bowtie2 index to allow read 553

mapping, a tabular correspondence table between all scaffold names and their associated 554

genome is constructed (scaff2bin.txt), and a table (genome_info.txt) of standard genomic 555

statistics is calculated including genome size, GC content, and number of scaffolds. Following 556

database construction, a label is manually added to each entry in the genome info table to indicate 557

if the entry represents a target organism, a vector, or a spike in control organism. All data are 558

propagated into a single folder that can be used by the ETmapper software for downstream 559

mapping and analysis. 560

561

Identification and quantification of insertion junctions and barcodes 562

To identify and map transposon insertion junctions and their associated barcodes in a mixed 563

population of microbial cells, reads (150 bp X 2) generated from PCR amplicons of putative 564

transposon insertion junctions are first processed using the ETmapper component of the ETsuite 565

software package implemented in R with the following steps: First reads are quality trimmed at 566

the 3’ end to remove low quality bases (Phred score ≤ 20) and sequencing adapters using 567

Cutadapt v2.1019. Cutadapt is then used to identify and remove provided transposon model 568

sequences from the 5’ end of forward reads, requiring a match to 95% of the shortest transposon 569

sequence in a provided set and allowing a 2% error rate. Read pairs where no transposon model 570

sequence is identified in the forward read are discarded. All identified and trimmed transposon 571

models are paired with their respective reads, stored, and barcodes are identified in these 572

sequences by searching for a known primer binding site sequence flanking the 5’ end of the 573



https://doi.org/10.1101/2020.07.17.209189


28

barcode (5’-CTATAGGGGATAGATGTCCACGAGGTCTCT-3’) allowing for 1 mismatch. 574

Subsequently, the 20 bp region following the known primer binding site is extracted as the barcode 575

sequence and associated with its respective read. The 3’ end of the paired reverse reads are then 576

trimmed to remove any transposon model sequence using Cutadapt, and only read pairs where 577

one mate is at least ≥ 40 bp following all trimming are retained for downstream mapping and 578

analysis. The fully trimmed paired end reads now consisting of only genomic sequence following 579

the transposon insertion site are mapped to the ETdb database used in a given experiment using 580

bowtie2 with default options20. Mapped read files are converted into a hit table indicating the 581

mapped genome, scaffold, genomic coordinates, mapQ score, and number of alignment 582

mismatches for each read in a pair using a custom Python script, bam_pe_stats.py, provided with 583

ETsuite. This table is then merged with read-barcode assignments to generate a final hit table 584

with the mapping information about each read pair, the transposon model identified, and the 585

associated barcode found for that read pair. Finally mapped read pairs are only retained for 586

downstream quantification if both reads map to the same genome, at least one mapped read in a 587

pair has a mapQ score ≥ 20, and a barcode was successfully identified and associated with the 588

read pair. 589

To quantify the number of unique barcodes and their associated reads mapping to 590

organisms in each sample of an experimental run, the filtered hit tables were processed using the 591

ETstats component of the ET-Seq software package with the following steps: Initially, all barcodes 592

identified across all samples in an experiment are aggregated and clustered using Bartender21 593

with the following supplied options: -l 4 -s 1 -d 3. Barcode clusters and their associated 594

barcodes/reads were only retained if all of the following criteria were true: (1) ≥ 75% of the reads 595

in a cluster mapped to one genome (the majority genome), (2) ≥ 75% of the reads in a cluster 596

were associated with the same transposon model (the majority model), and (3) the barcode 597

cluster had at least 2 reads. Subsequently, when quantifying reads and barcodes in each sample 598

of an experiment, the genome a read was mapped to and the transposon model it was associated 599



https://doi.org/10.1101/2020.07.17.209189


29

with had to agree with the majority assignments for the barcode cluster assigned to that read’s 600

barcode to be counted. Finally, we were aware that Illumina patterned flow cell related index 601

swapping would result in reads from a barcode cluster being misassigned across samples, even 602

when using unique dual indexing22. We could not simply limit barcode clusters to be associated 603

with only one sample, as our spike in control organisms contain the same pool of barcodes and 604

are added to every sample. Thus we estimated an empirical index swap rate across each 605

experiment and required that the number of reads (X) for a barcode to be positively identified in 606

a sample be always ≥ 2 and ≥ the binomial mean of observed read counts expected in any sample 607

for a barcode cluster with (R) reads across (N) samples based on the estimated swap rate (S) + 608

2 standard deviations (Eqn. 1) 609

610

Eqn. 1: 𝑋 ≥ $𝑅 × '!"() + 2 × ,𝑅 × (1 − 𝑆) × 𝑆&𝑋 ≥ 2 611

612

The index swap rate for an experiment was empirically estimated from barcode clusters assigned 613

only to target organisms based on the assumption that it would be highly unlikely for a barcode 614

cluster to have truly originated from independent integration events into the same organism in 615

more than one sample. Thus we assumed that for each barcode cluster associated with target 616

organisms, the majority of reads originated from the true sample and reads assigned to other 617

samples represented swaps. This is opposed to barcode clusters associated with our spike-in 618

organism, conjugal donor organism, or vectors which contain the same pool of barcodes directly 619

added to multiple samples. To identify swapped read counts we first quantify the total count of all 620

reads assigned to the majority genome across barcode clusters but that are not associated with 621

the majority sample of that cluster (E). Then we quantify the total count of reads associated with 622

the majority genome and associated with the majority sample across all clusters (C). Then 623



https://doi.org/10.1101/2020.07.17.209189


30

experiment wide swap rate was estimated by dividing the total number of reads not associated 624

with majority samples by the total number of reads (Eqn. 2) 625

626

Eqn. 2: 𝑆 = #(#&')

627

628

Following filtering, a hit table is returned that indicates for each genome in each sample, the 629

number of unique barcode clusters that were recovered, and the total number of reads associated 630

with these barcodes. As a final check for false positives during ET-Seq development we included 631

an organism genome as an ETsuite mapping target, Sinorhizobium meliloti, which was not 632

physically included in our 9-member synthetic or thiocyanate-degrading communities. We did not 633

detect any barcodes or reads associated with this genome. 634

635

Metagenomic data processing and coverage calculation. 636

Each ET-Seq sample is split and in parallel undergoes shotgun metagenomic sequencing to 637

determine the relative quantities of organisms present in the sample at the time of sampling. Raw 638

read files from metagenomic data are also processed using the ETmapper component of the 639

ETsuite software package with the following steps: First reads are quality trimmed at the 3’ end 640

to remove low quality bases (Phred score ≤ 20) and sequencing adapters using Cutadapt v2.1019. 641

Read pairs where at least one mate is not ≥ 40 bp in length are discarded. Trimmed read pairs 642

are mapped to the ETdb database used in a given experiment using bowtie220 with default 643

parameters. Mappings are filtered to require a minimum identity ≥ 95% and minimum mapQ score 644

≥ 20, and coverage is calculated using a custom script, calc_cov.py, included with the ETsuite 645

software. 646



https://doi.org/10.1101/2020.07.17.209189


31

Metagenomic sequencing for one biological replicate of the thiocyanate-degrading bioreactor 647

community (Conjugation - Control Sample - Replicate 3) failed. Metagenomic coverage values for 648

this replicate were generated by averaging the values from the other two biological replicates. 649

650

ET-Seq normalization and calculation of insertion efficiency. 651

To account for differences in sequencing depth, transposon junction PCR template amount, and 652

relative abundance of microbes in a community the data generated from both ET-Seq and 653

shotgun metagenomics were each normalized independently to values from the spike in control 654

organism, B. thetaiotaomicron, and then ET-Seq data is subsequently normalized by 655

metagenomic abundance as follows: Initially read count tables from ET-Seq and metagenomics 656

are filtered to remove any ET-Seq read count associated with < 2 barcodes and any metagenomic 657

read count < 10 reads. Next a size factor for each sample is calculated based on the geometric 658

mean of B. thetaiotaomicron reads for ET-Seq samples and B. thetaiotaomicron coverage for 659

metagenomics samples. ET-Seq read counts and metagenomic coverage values are then divided 660

by their respective sample size factors to create normalized values. Normalized ET-Seq read 661

counts are then divided by their paired normalized metagenomic coverage values to generate ET-662

Seq read counts that are fully normalized to both ET-Seq sequencing depth and metagenomic 663

coverage. Finally fully normalized ET-Seq read counts for target organisms are divided by the 664

fully normalized ET-Seq read count of B. thetaiotaomicron from an experiment (a constant that 665

represents the number of reads that would be obtained from an organism with 100% of its 666

chromosomes carrying insertions). The resulting values for each target organism in a sample 667

represent an estimate of the fraction of that organism’s population that received insertions 668

(Insertion Efficiency). Additionally, we multiply a target organism’s insertion efficiency by the 669

fractional relative abundance of that organism in a sample, based on metagenomic data, to 670

estimate the fraction of an entire sample population that is made up of cells of a given species 671

that received insertions (Insertion-Receiving Fraction in Total Community). 672



https://doi.org/10.1101/2020.07.17.209189


32

673

ET-Seq validation and establishing limits of detection and quantification. 674

To validate ET-Seq and establish both a limit of detection (LOD) and limit of quantification (LOQ) 675

for the assay, a library of K. michiganensis transposon mutants was constructed by antibiotic 676

selection following conjugation with pHLL250 (as described above), and this library was added to 677

untransformed samples of the combined nine-member community to create a transformed cell 678

concentration gradient. Technical triplicate samples were created where 1%, 0.1%, 0.01%, 679

0.001% and 0% of the total K. michiganensis cells (by OD600) in the mixture were those derived 680

from the transformed library. All samples (n = 15) were subjected to ET-Seq (as described above), 681

and pooled samples across all concentrations for each technical triplicate (n = 3; 5 concentrations) 682

were analyzed for community composition using shotgun metagenomics (as described above). 683

ET-Seq insertion efficiencies and insertion-receiving fraction in total community values were 684

averaged across technical replicates. Additionally, to derive the fraction of transformed K. 685

michiganensis cells that made up the total community (not just the K. michiganensis sub-686

population), the known fraction of K. michiganensis cells that were transformed in a sample was 687

multiplied by the measured relative abundance of K. michiganensis in a given technical replicate, 688

and these values were averaged across technical replicates. 689

To derive the LOD and LOQ for ET-Seq a linear regression was performed using the lm 690

function in the base package of R23 using the known fraction of transformed K. michiganensis 691

cells that made up the total community as the independent variable and the ET-Seq estimated 692

per community insertion efficiency as the dependent variable. The sample where transformed K. 693

michiganensis made up 0% of the community was not included in the regression analysis, but 694

was reserved to demonstrate zero response with no transformed cells present. LOD was 695

calculated as 3.3 * standard error of the regression / slope. The LOQ was calculated as 10 * 696

standard error of the regression / slope. 697

698



https://doi.org/10.1101/2020.07.17.209189


33

Identification of positive transformations and statistical analysis 699

For all ET-Seq experiments conducted we initially determined if any ET-Seq estimated per 700

community insertion efficiency was larger than the LOD. Values larger than the LOD constituted 701

a positive detection. For comparative statistical analysis conducted to compare insertion 702

efficiencies between transformation methods (Fig. 2b) only values that had a corresponding 703

insertion-receiving fraction in total community > LOQ were used. Statistical testing was conducted 704

using Analysis of Variance (ANOVA) implemented in the aov function in R23. Post-hoc testing was 705

conducted using the TukeyHSD function in R. Traditional 95% confidence intervals were 706

calculated using the groupwiseMean function of the rcompanion package in R. 707

708

Multiple delivery experiments in communities. 709

To test multiple delivery methods on the nine-member community, all members were grown at 710

30°C with Bacillus sp. AnTP16 and Methylobacterium sp. UNC378MF in R2A liquid media while 711

all other members were inoculated in LB. Equal amounts of community members were then 712

combined by OD600. This consortium then underwent transformation (of pHLL250), conjugation 713

(pHLL250 in WM3064), and electroporation of the pHLL250 vector (described in Delivery Methods 714

section). After delivery the community was spun down at 5,000g for 10 minutes, washed once 715

with LB and then spun down and frozen at -80°C until genomic DNA extraction. 716

The thiocyanate-degrading microbial community was sampled for delivery testing from biofilm on 717

a four liter continuously stirred tank reactor that had been maintained at steady state for over a 718

year. The reactor is operated with a two day hydraulic residence time, sparged with laboratory air 719

at 0.9 L/min, and fed with a mixture of molasses (0.15% w/v), thiocyanate (250 ppm), and KOH 720

to maintain pH 7. OD measurements were not feasible on the biofilm so we used its wet mass to 721

approximate equivalent OD and thus cell numbers to those used for the nine-member community. 722

This community underwent the same transformation, electroporation, and conjugation delivery 723

approaches as the nine-member community, however in all steps requiring media, LB was 724



https://doi.org/10.1101/2020.07.17.209189


34

replaced with molasses media (no thiocyanate). After delivery the community was spun down at 725

5,000g for 10 minutes, washed once with molasses media and then spun down and frozen at -726

80°C until genomic DNA extraction. 727

728

Benchmarking DART systems in E. coli. 729

We first constructed several DART systems to identify variants capable of efficient transposition 730

by conjugative delivery to E. coli. We performed parallel conjugation of each DART vector variant 731

containing GmR Tn cargo (2.1 kbp) and either a non-targeting gRNA or one of two lacZ-targeting 732

gRNAs for each system. For VcDART, variation of the promoter controlling the expression of 733

VcCasTn components did not significantly impact transposition efficiency (Extended Data Fig. 4c-734

d). Similarly for ShDART, expression of the sgRNA in three distinct transcriptional configurations 735

did not significantly impact transposition efficiency (Extended Data Fig. 4e-f). Since promoter and 736

transcriptional configuration variation had insignificant effects on transposition efficiency--and to 737

remove the requirement for promoter induction and reliance on T7 RNA polymerase--we 738

performed target specificity benchmarking of VcDART and ShDART using the same constitutive 739

Plac promoter. In this experiment, ShDART Cas and Tns genes and sgRNA were encoded in the 740

original transcriptional configuration and under control of the same promoter in which ShCasTn 741

was first characterized by Strecker et al.3. 742

The lacZ-targeting gRNAs were designed to target the lacZ α-peptide present in the 743

conjugation recipient strain E. coli BL21(DE3) but absent in the lacZΔM15 strains used as cloning 744

host (E. coli EC100D-pir+) or conjugation donor (E. coli WM3064), preventing transposition until 745

delivery into the recipient cell (Extended Data Fig. 4a). Donor WM3064 strains were transformed 746

and cultivated as described above, and recipient BL21(DE3) was inoculated from glycerol stock 747

into 100 mL LB in a 250 mL baffled shake flask at 37°C 250 rpm. Conjugations were performed 748

as described above using LB medium and 37°C incubation for every step, except that 0.1 mM 749

IPTG was added to VcDART conjugation plates in Extended Data Fig. 4d to induce transcription 750



https://doi.org/10.1101/2020.07.17.209189


35

from PT7-lac and T7 RNA polymerase expression in E. coli BL21(DE3). Transposition efficiencies 751

were calculated as the percentage of colonies resistant to 10 μg mL-1 gentamycin relative to viable 752

colonies in absence of gentamycin. 753

On/off-target analysis was performed for one lacZ-targeting guide for each DART system 754

by outgrowth under selection followed by genomic DNA extraction and ET-Seq. Specifically, 755

approximately 10,000 transconjugant cfu were plated on LB agar with gentamycin, incubated at 756

37°C overnight, scraped from agar into liquid LB medium, diluted to OD600 = 0.25 into 10 mL LB 757

plus gentamycin in 50 mL conical tubes, incubated at 37°C 250 rpm until OD600 = 1.0, centrifuged 758

at 4,000g, and frozen for downstream analysis. To determine the percent of selectable transposed 759

colonies possessing on-target and off-target edits, the total number of selectable colonies was 760

adjusted (Extended Data Fig. 4b) for on-target and off-target percent as determined by ET-Seq 761

(Fig. 4b). ET-Seq analysis was conducted on triplicate platings of DART transconjugants (n = 3 762

for each system) to identify transposon insertion locations and quantify on-target vs. off-target 763

insertions. As the targeted genomic region encoding the lacZ α-peptide is duplicated in E. coli 764

BL21(DE3), one of the two duplicated regions (749,903 bp --> 750,380 bp) was removed prior to 765

analysis to allow unambiguous mapping assignment. Subsequently, the standard ETsuite 766

analysis pipeline (as described above) was used to identify and map 300 bp X 2 reads containing 767

transposon junctions back to the recipient BL21(DE3) genome and cluster barcodes that 768

corresponded to unique insertion events. To confirm an insert location we first identified the exact 769

transposon-genome junction mapping coordinate that was the most frequent in the reads of a 770

barcode cluster (prime location) then required that a barcode cluster had: (1) at least 75% of its 771

reads coming from within 3 bp of the prime location and (2) at least 75% of its reads mapping to 772

the same strand. If these criteria were true the barcode cluster was counted as a unique insertion 773

and the prime location was used as the mapping locus by ET-Seq. An on-target insertion was 774

evaluated as a barcode cluster with a prime location within 200 bp downstream of the 3’ end of 775



https://doi.org/10.1101/2020.07.17.209189


36

the protospacer target. Finally all distances reported from the protospacer target site were 776

calculated from the last base pair of the 3’ end of the protospacer. 777

778

VcDART-mediated targeted editing in a community. 779

VcDART vectors encoding constitutive VcCasTn, constitutive bla:aadA Tn cargo (2.7 kbp), and 780

either a non-targeting (pBFC0888), K. michiganensis M5aI pyrF-targeting (pBFC0825), or P. 781

simiae WCS417 pyrF-targeting (pBFC0837) constitutive crRNA were transformed into E. coli 782

WM3064. Conjugations of these vectors into the nine-member community were performed as 783

described above on filter-topped LB agar plates with 12 hr incubation at 30°C. Lawns were 784

scraped from filters into 10 mL LB medium, vortexed, and 1 OD600*mL from each lawn was plated 785

on LB agar supplemented with 1 mg mL-1 5-FOA, 100 μg mL-1 carbenicillin, 100 μg mL-1 786

streptomycin, and 100 μg mL-1 spectinomycin. Following 3 days of incubation at 30°C, all cells 787

were scraped from the agar into 10 mL R2A medium, vortexed, diluted into 10 mL R2A 788

supplemented with 20 mg mL-1 uracil (for no selection controls) or R2A with uracil, 5-FOA, 789

carbenicillin, streptomycin, and spectinomycin to OD600 = 0.02, and split evenly across 4 wells 790

(2.5 mL/well) of a 24 deep well plate. After cultivation at 30°C and 750 rpm for 1 week, only the 791

cultures conjugated with VcDART containing pyrF-targeting crRNA had grown in presence of 792

antibiotics and 5-FOA. A small portion of each of these cultures was serially diluted in R2A and 793

plated on LB agar plus antibiotics to isolate and assay colonies by targeted PCR and Sanger 794

sequencing of pyrF loci. The remainder of each culture was centrifuged at 4,000g for 10 min and 795

frozen at -80°C for downstream bacterial 16S rRNA V4 amplicon metagenomic sequencing 796

(Novogene). Relative abundances were calculated as described below (16S rRNA V4 amplicon 797

analysis) for pre-conjugation nine-member community cultures and post-selection pyrF-targeted 798

cultures. 799

800

16S rRNA V4 amplicon analysis 801



https://doi.org/10.1101/2020.07.17.209189


37

16S rRNA V4 amplicon sequencing was conducted using the 515F (5’-802

GTGCCAGCMGCCGCGGTAA-3’) and 806R (5’-GGACTACHVGGGTWTCTAAT-3’) universal 803

bacterial primer set to generate 250 bp x 2 reads (Novogene). Samples were processed using 804

the UPARSE pipeline within the USEARCH software package to merge read pairs, remove 805

primers, quality filter sequences, remove chimeras, identify unique sequence variants (ZOTUs), 806

and quantify their abundance across samples as described previously24. To assign ZOTU 807

sequences to species known to be in our community mixture, we queried all 890 identified ZOTUs 808

against a custom database of 16S sequences derived from the genomes of the nine-member 809

community constituents using USEARCH25 ZOTUs with 100% identity to a 16S sequence in our 810

database were assigned to the matched species, and all matches < 100% identity were counted 811

as “Other”. Counts coming from ZOTUs of the same taxonomic assignment were merged and the 812

relative abundance of a species was calculated as its read counts divided by the total read counts 813

for the sample. 814

815

Statistics and reproducibility 816

All transformations (natural transformation, conjugation, electroporation) and subsequent 817

analyses were performed for three independent replicates. 818

819

Reporting summary 820

Further information on research design is available in the Nature Research Reporting Summary 821

linked to this paper. 822

823

Data availability 824

Summary data for genomes, plasmids, and oligonucleotides used in this study can be found in 825

Supplementary tables 1-4. Sequence data for all genomes assembled as part of this study and 826

newly constructed plasmids are in submission to NCBI with accession numbers pending. 827



https://doi.org/10.1101/2020.07.17.209189


38

Sequence data for genomes taken from Huddy, et al8 are in submission to NCBI with accession 828

numbers pending. Sequence data for genomes taken from Kantor, et al9 are available under NCBI 829

BioProject accession no. PRJNA279279. All genomes and plasmids used in the project will also 830

be made available on ggKbase (https://ggkbase.berkeley.edu/). Raw count data for all 831

experiments including both metagenome and ET-seq information is available at 832

https://github.com/SDmetagenomics/ETsuite/tree/master/manuscript_data. 833

834

Code availability 835

Custom R scripts for ET-Seq analysis and code used in the construction of figures are available 836

at https://github.com/SDmetagenomics/ETsuite. 837

838

Methods references 839

1. Adler, B. A. et al. Systematic Discovery of Salmonella Phage-Host Interactions via High-840

Throughput Genome-Wide Screens. doi:10.1101/2020.04.27.058388. 841

2. Klompe, S. E., Vo, P. L. H., Halpin-Healy, T. S. & Sternberg, S. H. Transposon-encoded 842

CRISPR–Cas systems direct RNA-guided DNA integration. Nature 571, 219–225 (2019). 843

3. Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. 844

Science 365, 48–53 (2019). 845

4. Liu, H. et al. Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria. 846

mSystems 3, (2018). 847

5. Liu, H. et al. Large-scale chemical-genetics of the human gut bacterium Bacteroides 848

thetaiotaomicron. BioRxiv (2019). 849

6. Devon, R. S., Porteous, D. J. & Brookes, A. J. Splinkerettes--improved vectorettes for 850

greater efficiency in PCR walking. Nucleic Acids Res. 23, 1644–1645 (1995). 851

7. Barquist, L. et al. The TraDIS toolkit: sequencing and analysis for dense transposon mutant 852



https://doi.org/10.1101/2020.07.17.209189


39

libraries. Bioinformatics 32, 1109–1111 (2016). 853

8. Huddy, R. J. et al. Thiocyanate and organic carbon inputs drive convergent selection for 854

specific autotrophic Afipia and Thiobacillus strains within complex microbiomes. bioRxiv 855

2020.04.29.067207 (2020) doi:10.1101/2020.04.29.067207. 856

9. Kantor, R. S. et al. Genome-Resolved Meta-Omics Ties Microbial Dynamics to Process 857

Performance in Biotechnology for Thiocyanate Degradation. Environ. Sci. Technol. 51, 858

2944–2953 (2017). 859

10. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate 860

genomic comparisons that enables improved genome recovery from metagenomes through 861

de-replication. ISME J. 11, 2864–2868 (2017). 862

11. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: 863

assessing the quality of microbial genomes recovered from isolates, single cells, and 864

metagenomes. Genome Res. 25, 1043–1055 (2015). 865

12. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify 866

genomes with the Genome Taxonomy Database. Bioinformatics (2019) 867

doi:10.1093/bioinformatics/btz848. 868

13. Diamond, S. et al. Mediterranean grassland soil C-N compound turnover is dependent on 869

rainfall and depth, and is mediated by genomically divergent microorganisms. Nat Microbiol 870

4, 1356–1367 (2019). 871

14. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high 872

throughput. Nucleic Acids Res. 32, 1792–1797 (2004). 873

15. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective 874

stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 875

268–274 (2015). 876

16. Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for 877

single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 878



https://doi.org/10.1101/2020.07.17.209189


40

1420–1428 (2012). 879

17. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site 880

identification. BMC Bioinformatics 11, 119 (2010). 881

18. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA 882

genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997). 883

19. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. 884

EMBnet.journal 17, 10–12 (2011). 885

20. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 886

9, 357–359 (2012). 887

21. Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to 888

count barcode reads. Bioinformatics 34, 739–747 (2018). 889

22. Costello, M. et al. Characterization and remediation of sample index swaps by non-890

redundant dual indexing on massively parallel sequencing platforms. BMC Genomics 19, 891

332 (2018). 892

23. R Core Team. R: A language and environment for statistical computing. (2013). 893

24. Edgar, R. C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. 894

Nat. Methods 10, 996–998 (2013). 895

25. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 896

26, 2460–2461 (2010). 897

898

Acknowledgments 899

We thank Morgan N. Price for data analysis input, Patrick Pausch for experimental advice, Shana 900

L. McDevitt, Eileen Wagner, and Hitomi Asahara for help with sequencing, and Trent R. Northen 901

for directional advice. Funding was provided by m-CAFEs Microbial Community Analysis & 902

Functional Evaluation in Soils, ([email protected]) a project led by Lawrence Berkeley National 903



https://doi.org/10.1101/2020.07.17.209189


41

Laboratory supported by the U.S. Department of Energy, Office of Science, Office of Biological & 904

Environmental Research under contract number DE-AC02-05CH11231. Support was also 905

provided by the Innovative Genomics Institute at UC Berkeley. B.E.R. and B.F.C. are supported 906

by the National Institute of General Medical Sciences of the National Institute of Health under 907

award numbers F32GM134694 and F32GM131654. Schematics were created with 908

BioRender.com. 909

910

Contributions 911

B.E.R., S.D., B.F.C, A.M.D., J.F.B, and J.A.D. conceived the work and designed the experiments. 912

B.E.R., B.F.C., C.H., M.X., Z.Z., D.C.S., K.T., T.K.O., and N.K. conducted the molecular biology 913

included. S.D., A.C.-C., C.H., and R.S. developed the bioinformatic analysis. B.E.R., S.D., B.F.C., 914

A.M.D., J.F.B., and J.A.D. analyzed and interpreted the data. 915

916

Competing Interests 917

The Regents of the University of California have patents pending related to this work on which 918

B.E.R., S.D., B.F.C., A.M.D., J.F.B., and J.A.D. are inventors. J.A.D. is a co-founder of Caribou 919

Biosciences, Editas Medicine, Intellia Therapeutics, Scribe Therapeutics and Mammoth 920

Biosciences, a scientific advisory board member of Caribou Biosciences, Intellia Therapeutics, 921

eFFECTOR Therapeutics, Scribe Therapeutics, Synthego, Mammoth Biosciences and Inari, and 922

is a Director at Johnson & Johnson and has sponsored research projects by Biogen, Roche and 923

Pfizer. J.F.B. is a founder of Metagenomi. 924

925

Additional Information 926

Correspondence and request for materials should be addressed to J.A.D. and J.F.B. 927



https://doi.org/10.1101/2020.07.17.209189


42

Extended Data 928

929

Extended Data Fig. 1 | Library preparation and data normalization for ET-Seq. a, ET-Seq requires low-930

coverage metagenomic sequencing and customized insertion sequencing. Insertion sequencing relies on 931

custom splinkerette adaptors, which minimize non-specific amplification, a digestion step for degradation 932

of delivery vector containing fragments, and nested PCR to enrich for fragments containing insertions with 933



https://doi.org/10.1101/2020.07.17.209189


43

high specificity. The second round of nested PCR adds unique dual index adaptors for Illumina sequencing. 934

b, This insertion sequencing data is first normalized by the reads to internal standard DNA which is added 935

equally to all samples and serves to correct for variation in reads produced per sample. Secondly, it is 936

normalized by the relative metagenomic abundances of the community members. 937



https://doi.org/10.1101/2020.07.17.209189


44

938

Extended Data Fig. 2 | ET-Seq determined insertion efficiencies for all nine consortium members as 939

a fraction of the entire community. ET-Seq determined insertion efficiencies for conjugation, 940



https://doi.org/10.1101/2020.07.17.209189


45

electroporation, and natural transformation on the nine-member synthetic community (n = 3 biological 941

replicates). The values shown are the estimated fraction a constituent species's transformed cells make of 942

the total community population. Control samples received no exogenous DNA. Average relative abundance 943

across all samples is indicated in parentheses (n = 18 independent samples). LOD and LOQ are indicated 944

in plots by short and long dashed lines respectively. 945

946



https://doi.org/10.1101/2020.07.17.209189


46

947

948

Extended Data Fig. 3 | ET-Seq determined insertion efficiencies for all thiocyanate-degrading 949

bioreactor community members as a fraction of the entire community. ET-Seq determined insertion 950

efficiencies for conjugation, electroporation, and natural transformation on the thiocyanate-degrading 951

bioreactor community (n = 3 biological replicates). The values shown are the estimated fraction a 952

constituent species’s transformed cells make of the total community population. Control samples received 953



https://doi.org/10.1101/2020.07.17.209189


47

no exogenous DNA. Average relative abundance across all samples is indicated in parentheses (n = 17 954

independent samples; due to a single failed metagenomic sequencing replicate, see methods). LOD and 955

LOQ are indicated in plots by short and long dashed lines respectively. 956



https://doi.org/10.1101/2020.07.17.209189


48

957



https://doi.org/10.1101/2020.07.17.209189


49

Extended Data Fig. 4 | Benchmarking DART vectors. a, E. coli WM3064 to E. coli BL21(DE3) conjugation, 958

transposition, and selection schematic (top) and guide RNAs targeting the lacZ α-fragment of recipient BL21(DE3), 959

which is absent from donor WM3064 (bottom). b,d,f, Percent selectable transposed colonies is calculated as the 960

number of colonies obtained with gentamycin selection divided by total viable colonies in absence of selection. b, 961

Insertion receiving colonies divided into on- and off-targeted. This was calculated by multiplying % selectable colonies 962

for representative guides in d and f (highlighted by grey bars) by the on- or off-target rates (shown in Fig. 4). c, 963

Transposition with VcDART was tested with three promoters. The variant using the Plac promoter, harvested from 964

pHelper_ShCAST_sgRNA18, was also used for Fig. 4, 5, and Extended Data Fig. 4b (*). d, Efficiencies of VcDART 965

using various promoters. e, Transposition with ShDART was tested with three transcriptional configurations, all using 966

Plac18. The configuration used for characterization of ShCasTn originally18 was also used for Fig. 4 and Extended Data 967

Fig. 4b (*). f, Efficiencies of ShDART using various promoters. b, d, f, Crossbar indicates mean and error bars indicate 968

one standard deviation from the mean (n = 3 biological replicates). Guide RNAs ending in “NT” are non-targeting 969

negative control samples. 970



https://doi.org/10.1101/2020.07.17.209189


Download - Targeted Genome Editing of Bacteria Within Microbial ... · 17/07/2020 · 56 editing within a community of wild-type microbes has not yet been reported12. 57 Here we show that specific

Top Related