1
Targeted Genome Editing of Bacteria Within Microbial 1
Communities 2
3
Benjamin E. Rubin1,2,13, Spencer Diamond1,3,13, Brady F. Cress1,2,13, Alexander Crits-Christoph4, 4
Christine He1,2,3, Michael Xu1,2, Zeyi Zhou1,2, Dylan C. Smock1,2, Kimberly Tang1,2, Trenton K. 5
Owens5, Netravathi Krishnappa1, Rohan Sachdeva1,3, Adam M. Deutschbauer4,5, Jillian F. 6
Banfield1,3,6,7* & Jennifer A. Doudna1,2,8-12* 7
1Innovative Genomics Institute, University of California, Berkeley, CA, USA. 2Department of Molecular and 8
Cell Biology, University of California, Berkeley, CA, USA. 3Department of Earth and Planetary Science, 9
University of California, Berkeley, CA, USA. 4Department of Plant and Microbial Biology, University of 10
California, Berkeley, CA, USA. 5Environmental Genomics and Systems Biology Division, Lawrence 11
Berkeley National Laboratory, Berkeley, California, USA. 6Environmental Science, Policy and Management, 12
University of California, Berkeley, CA, USA. 7School of Earth Sciences, University of Melbourne, 13
Melbourne, Victoria, Australia. 8California Institute for Quantitative Biosciences, University of California, 14
Berkeley, CA, USA. 9Department of Chemistry, University of California, Berkeley, CA, USA. 10Howard 15
Hughes Medical Institute, University of California, Berkeley, CA, USA. 11Molecular Biophysics & Integrated 16
Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA 12Gladstone Institutes, 17
University of California, San Francisco, CA, USA. 13These authors contributed equally to this work: 18
Benjamin E. Rubin, Spencer Diamond, Brady F. Cress. 19
20
*e-mail: [email protected]; [email protected] 21
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
2
Knowledge of microbial gene functions comes from manipulating the DNA of individual 22
species in isolation from their natural communities. While this approach to microbial 23
genetics has been foundational, its requirement for culturable microorganisms has left the 24
majority of microbes and their interactions genetically unexplored. Here we describe a 25
generalizable methodology for editing the genomes of specific organisms within a 26
complex microbial community. First, we identified genetically tractable bacteria within a 27
community using a new approach, Environmental Transformation Sequencing (ET-Seq), 28
in which non-targeted transposon integrations were mapped and quantified following 29
community delivery. ET-Seq was repeated with multiple delivery strategies for both a nine-30
member synthetic bacterial community and a ~200-member microbial bioremediation 31
community. We achieved insertions in 10 species not previously isolated and identified 32
natural competence for foreign DNA integration that depends on the presence of the 33
community. Second, we developed and used DNA-editing All-in-one RNA-guided CRISPR-34
Cas Transposase (DART) systems for targeted DNA insertion into organisms identified as 35
tractable by ET-Seq, enabling organism- and locus-specific genetic manipulation within 36
the community context. These results demonstrate a strategy for targeted genome editing 37
of specific organisms within microbial communities, establishing a new paradigm for 38
microbial manipulation relevant to research and applications in human, environmental, 39
and industrial microbiomes. 40
41
Genetic mutation and observation of phenotypic outcomes are the primary means of deciphering 42
gene function in microorganisms. This classical genetic approach requires manipulation of 43
isolated species, limiting knowledge in three fundamental ways. First, the vast majority of 44
microorganisms have not been isolated in the laboratory and are thus largely untouched by 45
molecular genetics1. Second, emergent properties of microbial communities that may arise due 46
to interactions between their constituents, remain mostly unexplored2. Third, microorganisms 47
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
3
grown and studied in isolation quickly adapt to their new lab environment, obscuring their true 48
“wild type” physiology3. Since most microorganisms relevant to the environment, industry and 49
health live in communities, approaches for precision genome modification (editing) in community 50
contexts will be transformative. 51
Advances toward genome editing within microbial communities have included assessing 52
gene transfer to microbiomes using selectable markers4–9, microbiome manipulation leveraging 53
pre-modified isolates10, and use of temperate phage for species-specific integration of genetic 54
payloads11. However, a generalizable strategy for programmable organism- and locus-specific 55
editing within a community of wild-type microbes has not yet been reported12. 56
Here we show that specific organisms within microbial communities can be targeted for 57
site-specific genome editing, enabling manipulation of species without requiring prior isolation or 58
engineering. Using a new method developed for this study, Environmental Transformation 59
Sequencing (ET-Seq), we identified genetically accessible species within a nine-member 60
synthetic community and among previously non-isolated species in a 197-member bioremediation 61
community. These results enabled targeted genome editing of microbes in the nine-member 62
community using DNA-editing All-in-one RNA-guided CRISPR-Cas Transposase (DART) 63
systems developed here. The resulting species-specific editing provides the first broadly 64
applicable strategy for organism- and locus-specific genetic manipulation within a microbial 65
community, hinting at new emergent properties of member organisms and methods for controlling 66
microorganisms within their native environments. 67
68
ET-Seq detects genetically accessible microbial community members 69
Editing organisms within a complex microbiome requires knowing which constituents are 70
accessible to nucleic acid delivery and editing. We developed ET-Seq to assess the ability of 71
individual species within a microbial community to acquire and integrate exogenous DNA (Fig. 72
1a). In ET-Seq, a microbial community is exposed to a randomly integrating mobile genetic 73
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
4
element (here, a mariner transposon), and in the absence of any selection, total community DNA 74
is then extracted and sequenced using two protocols. In the first, we enrich and sequence the 75
junctions between the inserted and host DNA to determine insertion location and quantity in each 76
host. This step requires comparison of the junctions to previously sequenced community 77
reference genomes. In the second, we conduct low-depth metagenomic sequencing to quantify 78
the abundance of each community member in a sample (Extended Data Fig. 1a). Together these 79
sequencing procedures provide relative insertion efficiencies for microbiome members. To 80
normalize these data according to a known internal standard, we add to every sample a uniform 81
amount of genomic DNA from an organism that has previously been transformed with, and 82
selected for, a mariner transposon. In the internal standard, we expect every genome to contain 83
an insertion. The final output of ET-Seq is a fractional number representing the proportion of a 84
target organism’s population that harbored transposon insertions at the time DNA was extracted, 85
or insertion efficiency (Extended Data Fig. 1b). To facilitate the analysis of these disparate data, 86
we developed a complete bioinformatic pipeline for quantifying insertions and normalizing results 87
according to both the internal control and metagenomic abundance 88
(https://github.com/SDmetagenomics/ETsuite and Methods). Together the experimental and 89
bioinformatic approaches of ET-Seq reveal species-specific genetic accessibility by measuring 90
the percentage of each member of a given microbiome that acquires a transposon insertion. 91
92
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
5
93
Fig. 1 | ET-Seq for quantitative measurement of insertion efficiency in a microbial community. a, ET-94
Seq provides data on insertion efficiency of multiple delivery approaches, including conjugation, 95
electroporation, and natural DNA transformation, on microbial community members. In this illustrative 96
example, the blue strain is most amenable to electroporation (star). This data allows for the determination 97
of feasible targets and delivery methods for DART targeted editing. b, ET-Seq determined efficiencies for 98
known quantities of spiked-in pre-edited K. michiganensis (Klebsiella_1). Data shown is the mean of three 99
technical replicates. LOD is the lowest insertion fraction at which accurate detection of insertions is 100
expected and LOQ is the lower limit at which this fraction is expected to be quantifiable. Solid line is the fit 101
of the linear regression to the data not including zero (n = 4 independent samples) that is used to calculate 102
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
6
LOD and LOQ (slope = 0.2137; intercept = 1.813*10-6). c-d, ET-Seq determined insertion efficiencies in the 103
nine-member consortium (n = 3 biological replicates) with conjugative delivery shown as c, a portion of the 104
entire community and d, a portion of each species. Control samples received no exogenous DNA. Average 105
relative abundances of community constituents across conjugation samples (n = 6 independent samples) 106
are indicated in parentheses. LOD and LOQ are indicated in plots by short and long dashed lines 107
respectively. 108
109
ET-Seq was developed and tested on a nine-member microbial consortium made up of 110
bacteria from three phyla that are often detected and play important metabolic roles within soil 111
microbial communities (Supplementary Table 1). We initially endeavored to test the accuracy and 112
detection limit by adding to the nine-member community a known amount of a previously prepared 113
mariner transposon library of one of its member species, Klebsiella michiganensis M5a1 114
(Klebsiella_1). The ET-Seq derived insertion efficiencies were closely correlated to the known 115
fractions of edited K. michiganensis present in each sample (Fig. 1b). Using this data we 116
calculated a limit of detection (LOD) and limit of quantification (LOQ) for our approach (Methods). 117
The LOD suggests that a fraction of ≥ 8.4*10-6 of transformed cells out of the total community 118
would be detectable by ET-Seq. 119
Next, the mariner transposon vector was delivered to the nine-member community through 120
conjugation. We could measure conjugation reproducibly and quantitatively in the three species 121
that grew to make up over 99% of the community (Fig. 1c). We further normalized insertion 122
efficiency in each species according to its abundance so that their insertion efficiencies represent 123
insertion containing cells as a portion of total cells for each species (Fig. 1d). Even for 124
Paraburkholderia caledonica, which made up ~0.1% of the community, we could measure 125
insertions. We detected no insertions in the remaining community members, which was expected 126
given their extreme rarity in the community (less than 0.05%). 127
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
7
We next used ET-Seq to compare insertion efficiencies in the nine-member community 128
after transposon delivery by conjugation, natural transformation with no induction of competence, 129
or electroporation of the transposon vector. Together these approaches showed reproducible 130
insertion efficiencies above the LOD in five of the nine community members (Fig. 2a and Extended 131
Data Fig. 2). Additionally we could identify preferred delivery methods for some members in this 132
community context, such as electroporation being consistently reproducible for Dyella japonica 133
UNC79MFTsu3.2 while conjugation was not. These results show that ET-Seq can identify and 134
quantify genetic manipulation of microbial community members and reveal suitable DNA delivery 135
methods for each. 136
137
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
8
138
Fig. 2 | ET-Seq detection of insertion efficiency across multiple delivery approaches. a, ET-Seq 139
determined insertion efficiencies for conjugation, electroporation, and natural transformation on the nine-140
member consortium (n = 3 biological replicates). Only members with at least one positive insertion efficiency 141
value across the delivery methods are shown. Average relative abundance across all samples (n = 18 142
independent samples) is indicated in parentheses. b, Comparing delivery strategies across data from all 143
organisms. Cross bars indicate the mean value and whiskers denote the 95% confidence interval for the 144
mean (Conjugation n = 9; Electroporation n = 14; Natural Transformation n = 9). c, Comparison of insertion 145
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
9
efficiencies measured for natural transformation of K. michiganensis (Klebsiella_1) in isolated culture (n = 146
3 biological replicates) compared to K. michiganensis grown in the community context (n = 3 biological 147
replicates). 148
149
Notably, five organisms exhibited some degree of natural competency, although average 150
efficiencies were significantly lower for natural transformation than for delivery through 151
electroporation (ANOVA followed by Tukey's HSD; two-sided test; p = 0.0009) (Fig. 2b). To the 152
best of our knowledge, no isolates of the Klebsiella genus including K. michiganensis are known 153
to be naturally competent. We conducted a second experiment to compare the insertion efficiency 154
of K. michiganensis cultivated in isolation versus grown in the community context. ET-Seq 155
returned no values above the LOD for natural transformation of K. michiganensis in isolation, but 156
within the community ET-Seq returned values well into the quantifiable range (Fig. 2c). This 157
apparent emergent property of natural competence within a small synthetic community provides 158
tantalizing support for the possibility of community induced natural transformation, an idea suggested 159
in previous work, but experimentally unstudied due to lack of tools for measuring horizontal gene 160
transfer events within a community13,14. 161
162
Genetic accessibility of uncultivated species within an environmental microbiome 163
To test the potential for editing in a complex and environmentally realistic community that has not 164
been reduced to isolates, we conducted ET-Seq on a genomically characterized 197 member 165
bioreactor-derived consortium that degrades thiocyanate (SCN-), a toxic byproduct of gold 166
processing15. We sampled biofilm from the reactor and conducted ET-Seq with a panel of delivery 167
techniques: conjugation, electroporation, and natural transformation. Across ET-Seq replicates at 168
least one measurement above the LOD was identified for 15 members of the bioreactor 169
community. We also note that the transformed organisms make up ~87% of the bacterial fraction 170
by relative abundance (Fig. 3a, Extended Data Fig. 3). Ten of these organisms were species that 171
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
10
had not previously been isolated or edited (Supplementary Table 1), and overall members from 7 172
of the 12 phyla detected in this consortium were successfully transformed (Fig. 3b). This included 173
an Afipia sp. known to play an important role in the thiocyanate degradation process. Additionally, 174
one of the transposon recipients, Microbacterium ginsengisoli (Microbacterium_3), is a putative 175
host for Saccharibacteria, a candidate phyla radiation (CPR) organism that has been observed in 176
this reactor system16. Notably, members of the CPR are resistant to typical isolation techniques 177
due to heavy dependence on other community members, and little is known about the nature of 178
their likely symbiotic relationships with other organisms17. Here, ET-Seq has uncovered a 179
genetically tractable putative host organism, raising the possibility of genetically editing the host 180
to probe CPR/host symbiotic relationships within a complex microbial community. In this way, ET-181
Seq reveals genetic accessibility and the tools necessary to achieve it in previously 182
unapproachable and biologically important members of an environmentally relevant community. 183
184
Fig. 3 | ET-Seq detection of insertion efficiency in thiocyanate-degrading bioreactor. a, ET-Seq 185
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
11
determined insertion efficiencies for conjugation, electroporation, and natural transformation on the 186
thiocyanate-degrading bioreactor community (n = 3 biological replicates). Average relative abundance 187
across all samples is indicated in parentheses (n = 17 independent samples). b, A ribosomal protein S3 188
(rpS3) phylogenetic tree of all organisms in the thiocyanate-degrading bioreactor genome database. Only 189
community members receiving at least one insertion by conjugation, electroporation, or natural 190
transformation detectable above LOD are indicated by filled circles. Filled circles indicate success of 191
method and open circles indicate method was not detected. A star indicates genomically encoded SCN- 192
degradation capacity in organisms shown to be edited by at least one method. Tree was constructed from 193
an alignment of 262 rps3 protein sequences using IQ-TREE. 194
195
Targeted genome editing in microbial communities using CRISPR-Cas transposases 196
The ability to introduce genome edits to a single type of organism in a microbial community and 197
to target those edits to a defined location within its genome would be a foundational advance in 198
microbiological research and would have many useful applications. We reasoned that RNA-199
guided CRISPR-Cas Tn7 transposases could provide the ability to both ablate function of targeted 200
genes and deliver customized genetic cargo in organisms shown to be genetically tractable by 201
ET-Seq18–20 (Fig. 1a). However, the two-plasmid ShCasTn18 and three-plasmid VcCasTn19 202
systems are not amenable to efficient delivery within complex microbial communities or even 203
beyond E. coli due to their multiple plasmids. Since ET-Seq identified conjugation and 204
electroporation as broadly effective delivery approaches in the tested communities, we designed 205
and constructed all-in-one conjugative versions of these CasTn vectors that could be used for 206
delivery by either strategy (Fig. 4a and Methods). These DART systems are barcoded and 207
compatible with the same sequencing methods used for ET-Seq, and can be used to assay the 208
efficacy of CRISPR-Cas-guided transposition into the genome of a target organism. 209
210
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
12
211
Fig. 4 | Benchmarking all-in-one conjugal targeted vectors. a, Schematic of VcDART and ShDART 212
delivery vectors. b, Fraction of insertions that occur in a 60 bp window around the target site. Mean for 213
three independent biological replicates is shown as cross bars. c-d, Aggregate unique insertion counts (n 214
= 3 biological replicates) across the E. coli BL21(DE3) genome, determined by presence of unique barcodes, 215
using c, VcDART and d, ShDART. The inset shows a 60 bp window downstream of the target site. Insertion 216
distance downstream of the target site is calculated from the 3’ end of the protospacer. 217
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
13
218
We compared the transposition efficiency and specificity of the DART systems in E. coli 219
in order to select the most promising candidate for targeted genome editing in microbial 220
communities. VcDART and ShDART systems harboring GmR cargo with a lacZ-targeting or non-221
targeting guide RNA were conjugated into E. coli to quantify transposition efficiency, and target 222
site specificity was assayed using ET-Seq following outgrowth of transconjugants in selective 223
medium (Methods and Extended Data Fig. 4a). While ShDART yielded approximately tenfold 224
more colonies possessing insertions than ShDART (Extended Data Fig. 4), >92% of the 225
selectable colonies obtained using ShDART were off-target, compared to no detectable off-target 226
insertions for VcDART (Fig. 4b-d). Due to VcDART’s high target site specificity and the 227
undesirable propensity for ShCasTn to co-integrate its donor plasmid21,22, we focused on VcDART 228
to test the potential for targeted microbial community genome editing. 229
230
Targeted microbial community editing by programmable transposition 231
We reasoned that RNA-programmed transposition could be deployed for targeted editing of 232
specific types of organisms within a microbial consortium. ET-Seq had shown two species within 233
the nine-member community, K. michiganesis and Pseudomonas simiae WCS417, to be both 234
abundant and tractable by conjugation (Fig. 1d). We targeted both of these organisms using 235
conjugation to introduce the VcDART vector into the community with guide RNAs specific to their 236
genomes (Fig. 5a). Insertions were designed to produce loss-of-function mutations in the K. 237
michiganesis and P. simiae pyrF gene, an endogenous counterselectable marker allowing growth 238
in the presence of 5-fluoroorotic acid (5-FOA) when disrupted. The transposons carried two 239
antibiotic resistance markers conferring resistance to streptomycin and spectinomycin (aadA) and 240
carbenicillin (bla). Together the simultaneous loss-of-function and gain-of-function mutations 241
allowed for a strong selective regime. VcDART targeted to K. michiganensis or to P. simiae pyrF 242
followed by selection led to enrichment of these organisms to ~98% and ~97% pure culture 243
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
14
respectively (Fig. 5b). No outgrowth was detected when using a guide RNA that did not target 244
these respective microbial genomes. Recovered transformant colonies of K. michiganensis and 245
P. simiae analyzed by PCR and Sanger sequencing showed full length, pyrF-disrupting VcDART 246
transposon insertions 48-49 bp downstream of the guide RNA target site, consistent with 247
CRISPR-Cas transposase-catalyzed transposition events at the desired genomic location (Fig. 248
4c and 5c). These results demonstrate that targeted genome editing using DART enables genetic 249
manipulation of distinct members of a complex microbial community. This targeted editing of 250
microorganisms in a community context can also enable subsequent exploitation of modified 251
phenotypes. 252
253
254
Fig. 5 | Targeted editing in the nine-member consortium. a, Conjugative VcDART delivery into a 255
microbial community using species-specific crRNA, followed by selection for transposon cargo, facilitates 256
selective enrichment of targeted organisms. b, Relative abundance of nine-member community 257
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
15
constituents measured by 16S rRNA sequencing before conjugative VcDART delivery and after selection 258
for pyrF-targeted transposition in K. michiganensis or P. simiae (n = 3 biological replicates). * indicates no 259
growth detected in selective medium. c, Representative Sanger sequencing chromatogram of PCR product 260
spanning transposon insertion site at targeted pyrF locus in K. michiganensis (top) and P. simiae (bottom) 261
colonies following VcDART-mediated transposon integration and selection. Target-site duplications (TSD) 262
are indicated with dashed boxes. 263
264
Discussion 265
We have demonstrated organism- and locus-specific genome editing within a microbial 266
community, providing a new approach to microbial genetics and microbiome manipulation for 267
research and applications. ET-Seq revealed the genetic accessibility of organisms growing within 268
microbial communities, including ten microbial species that had not been previously isolated or 269
found to be genetically manipulated. The creation of all-in-one vectors encoding two naturally 270
occurring CRISPR-Cas transposon systems enabled comparison of their targeted genome editing 271
capabilities. These experiments showed that only one of the two systems, which we termed 272
VcDART, enabled precise RNA-programmable microbial genome editing. The ability to conduct 273
targeted genome editing of two bacteria within a nine-member synthetic community and to use 274
the introduced genetic changes as a means of organism isolation demonstrates a new approach 275
to microbiome manipulation. Traditionally, the combined steps of culturing an environmental 276
microbe, determining the ideal means to transform it, and implementing targeted editing could 277
take years or could fail altogether23. ET-Seq combined with VcDART compresses the pipeline for 278
establishing genetics in microorganisms to weeks and expands the diversity of organisms that 279
can be targeted beyond those that can be cultivated in isolation. 280
In addition to providing a roadmap for targeted microbial genome editing, ET-Seq can be 281
used to discover and analyze horizontal gene transfer in complex communities. In this study, we 282
observed unexpected community-dependent natural transformation in the nine-member 283
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
16
community and characterized the horizontal gene transfer events experimentally in the complex 284
microbiome of a thiocyanate-degrading bioreactor. In future experiments, multiple ET-Seq time 285
points could be taken in a community after delivery to measure directly both the portion of the 286
community receiving DNA and the persistence of gene transfer, rather than tracking horizontal 287
gene transfer using bioinformatics24 or indirect experimental methods4–9. Furthermore, as ET-Seq 288
is applied in increasingly diverse and complex environments, an atlas of editable taxa can be 289
created including optimal delivery approaches. To expand this dataset, we plan to apply ET-Seq 290
to new microbial communities being sampled for metagenomic sequencing, a natural pairing 291
because ET-Seq depends on availability of genome sequences for the component organisms. 292
In the future, tools are needed to create more generally applicable and persistent targeted 293
genome edits. The canonical approach involving antibiotic selection for edited bacteria is 294
infeasible in large complex communities, where many natural resistances exist25,26. Even in the 295
nine-member synthetic community used in this study, three antibiotics and counterselection were 296
necessary to achieve strict selection. Improved delivery strategies and alternative positive and 297
counter selection methods should enable more efficient editing. In the gut microbiome, porphyran 298
has been used successfully for selection of a spiked-in organism capable of utilizing this 299
compound27. Such approaches for more efficient microbial community editing will enable research 300
to answer fundamental questions as well as allow manipulation of agricultural, industrial, and 301
health-relevant microbiomes. The combination of ET-Seq and DART systems presented here 302
provide the foundation of the new field of in situ microbial genetics. 303
304
References 305
1. Steen, A. D. et al. High proportions of bacteria and archaea across most biomes remain 306
uncultured. ISME J. (2019) doi:10.1038/s41396-019-0484-y. 307
2. Pascual-García, A., Bonhoeffer, S. & Bell, T. Metabolically cohesive microbial consortia and 308
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
17
ecosystem functioning. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190245 (2020). 309
3. Fux, C. A., Shirtliff, M., Stoodley, P. & Costerton, J. W. Can laboratory reference strains 310
mirror ‘real-world’ pathogenesis? Trends Microbiol. 13, 58–63 (2005). 311
4. Pukall, R., Tschäpe, H. & Smalla, K. Monitoring the spread of broad host and narrow host 312
range plasmids in soil microcosms. FEMS Microbiol. Ecol. 20, 53–66 (1996). 313
5. De Gelder, L., Vandecasteele, F. P. J., Brown, C. J., Forney, L. J. & Top, E. M. Plasmid 314
Donor Affects Host Range of Promiscuous IncP-1β Plasmid pB10 in an Activated-Sludge 315
Microbial Community. Appl. Environ. Microbiol. 71, 5309–5317 (2005). 316
6. Musovic, S., Oregaard, G., Kroer, N. & Sørensen, S. J. Cultivation-Independent 317
Examination of Horizontal Transfer and Host Range of an IncP-1 Plasmid among Gram-318
Positive and Gram-Negative Bacteria Indigenous to the Barley Rhizosphere. Applied and 319
Environmental Microbiology vol. 72 6687–6692 (2006). 320
7. Musovic, S., Klümper, U., Dechesne, A., Magid, J. & Smets, B. F. Long-term manure 321
exposure increases soil bacterial community potential for plasmid uptake. Environ. 322
Microbiol. Rep. 6, 125–130 (2014). 323
8. Klümper, U. et al. Broad host range plasmids can invade an unexpectedly diverse fraction 324
of a soil bacterial community. ISME J. 9, 934–945 (2015). 325
9. Ronda, C., Chen, S. P., Cabral, V., Yaung, S. J. & Wang, H. H. Metagenomic engineering 326
of the mammalian gut microbiome in situ. Nat. Methods 16, 167–170 (2019). 327
10. Farzadfard, F., Gharaei, N., Citorik, R. J. & Lu, T. K. Efficient Retroelement-Mediated DNA 328
Writing in Bacteria. bioRxiv (2020). 329
11. Hsu, B. B., Way, J. C. & Silver, P. A. Stable Neutralization of a Virulence Factor in Bacteria 330
Using Temperate Phage in the Mammalian Gut. mSystems 5, (2020). 331
12. Sheth, R. U., Cabral, V., Chen, S. P. & Wang, H. H. Manipulating Bacterial Communities by 332
in situ Microbiome Engineering. Trends Genet. 32, 189–200 (2016). 333
13. Wang, X. et al. Across Genus Plasmid Transformation Between Bacillus subtilis and 334
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
18
Escherichia coli and the Effect of Escherichia coli on the Transforming Ability of Free 335
Plasmid DNA. Current Microbiology vol. 54 450–456 (2007). 336
14. Borgeaud, S., Metzger, L. C., Scrignari, T. & Blokesch, M. The type VI secretion system of 337
Vibrio cholerae fosters horizontal gene transfer. Science 347, 63–67 (2015). 338
15. Kantor, R. S. et al. Genome-Resolved Meta-Omics Ties Microbial Dynamics to Process 339
Performance in Biotechnology for Thiocyanate Degradation. Environ. Sci. Technol. 51, 340
2944–2953 (2017). 341
16. Huddy, R. J. et al. Thiocyanate and organic carbon inputs drive convergent selection for 342
specific autotrophic Afipia and Thiobacillus strains within complex microbiomes. bioRxiv 343
2020.04.29.067207 (2020) doi:10.1101/2020.04.29.067207. 344
17. Castelle, C. J. et al. Biosynthetic capacity, metabolic variety and unusual biology in the 345
CPR and DPANN radiations. Nat. Rev. Microbiol. 16, 629–645 (2018). 346
18. Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. 347
Science 365, 48–53 (2019). 348
19. Klompe, S. E., Vo, P. L. H., Halpin-Healy, T. S. & Sternberg, S. H. Transposon-encoded 349
CRISPR–Cas systems direct RNA-guided DNA integration. Nature 571, 219–225 (2019). 350
20. Petassi, M. T., Hsieh, S.-C. & Peters, J. E. Guide RNA categorization enables target site 351
choice in Tn7-CRISPR-Cas transposons. bioRxiv 2020.07.02.184150 (2020) 352
doi:10.1101/2020.07.02.184150. 353
21. Rice, P. A., Craig, N. L. & Dyda, F. Comment on ‘RNA-guided DNA insertion with CRISPR-354
associated transposases’. Science 368, (2020). 355
22. Strecker, J., Ladha, A., Makarova, K. S., Koonin, E. V. & Zhang, F. Response to Comment 356
on ‘RNA-guided DNA insertion with CRISPR-associated transposases’. Science 368, 357
(2020). 358
23. Laurenceau, R. et al. Toward a genetic system in the marine cyanobacterium 359
Prochlorococcus. Access Microbiology 6, 23 (2020). 360
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
19
24. Soucy, S. M., Huang, J. & Gogarten, J. P. Horizontal gene transfer: building the web of life. 361
Nat. Rev. Genet. 16, 472–482 (2015). 362
25. Hu, Y. et al. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of 363
human gut microbiota. Nat. Commun. 4, 2151 (2013). 364
26. Forsberg, K. J. et al. Bacterial phylogeny structures soil resistomes across habitats. Nature 365
509, 612–616 (2014). 366
27. Shepherd, E. S., DeLoache, W. C., Pruss, K. M., Whitaker, W. R. & Sonnenburg, J. L. An 367
exclusive metabolic niche enables strain engraftment in the gut microbiota. Nature 557, 368
434–438 (2018). 369
370
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
20
Methods 371
Plasmid construction and barcoding 372
For ET-Seq measurement of genetic tractability in community members, DNA encoding a non-373
targeted mariner transposon was delivered. The mariner transposon integrates into “TA” 374
sequences in recipient cells. For delivery of the mariner transposon, we made use of the 375
previously created pHLL250 vector, which contains an RP4 origin of transfer (oriT), AmpR, 376
conditional (pir+-dependent) R6K origin, and an AseI restriction site to facilitate depletion of vector 377
from DNA samples in ET-Seq library preparations1. Unique to each transposon on this vector is 378
a random 20 bp barcode sequence to aid in the discrimination of unique insertion events from 379
duplications of the same insertion due to cell division or PCR. 380
DART vectors were designed to encode all components required for delivery and editing 381
(Supplementary Table 2 and Extended Data Fig. 4). VcCasTn genes, crRNA, and Tn were 382
synthesized as gBlocks (IDT). pHelper_ShCAST_sgRNA was a gift from Feng Zhang (Addgene 383
plasmid #127921; http://n2t.net/addgene:127921; RRID:Addgene_127921) and was used to 384
clone ShCasTn genes and sgRNA. pDonor_ShCAST_kanR was a gift from Feng Zhang 385
(Addgene plasmid # 127924 ; http://n2t.net/addgene:127924 ; RRID:Addgene_127924) and was 386
used to clone the ShCasTn transposon. tns genes, cas genes, and crRNA/sgRNA were 387
consolidated into a single operon (with various promoters and transcriptional configurations) on 388
the same vector as the cognate transposon. The left end of the cognate Tn was encoded 389
downstream of the crRNA/sgRNA, followed by Tn cargo, barcode, and Tn right end. DART Tn LE 390
and RE were designed to include the minimal sequence that both included all putative TnsB 391
binding sites and was previously shown to be functional2,3. Specifically, VcDART LE (108 bp) and 392
RE (71 bp) each encompass three 20 bp putative TnsB binding sites, spanning from the edge of 393
the 8 bp terminal ends to the edge of the third putative TnsB binding site2. ShDART LE (113 bp) 394
spans the boundaries of the long terminal repeat and both additional putative TnsB binding sites, 395
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
21
while the RE (211 bp) encompasses the long terminal repeat and all four additional putative TnsB 396
binding sites3. 397
Vectors were cloned using BbsI (NEB) Golden Gate assembly of part plasmids, each 398
encoding different regions of the final plasmid. Of note, the backbone encodes RP4 oriT, AmpR, 399
conditional R6K origin, and an AsiSI+SbfI double digestion site for vector depletion during ET-400
Seq library preparations. A 2xBsaI spacer placeholder enabled spacer cloning with BsaI (NEB) 401
Golden Gate. A 2xBsmBI barcode placeholder was encoded immediately inside the Tn right end 402
and was used for barcoding as described below. Part plasmids were propagated in E. coli Mach1-403
T1R (QB3 Macro Lab). Golden Gate reactions for all-in-one vector assembly were purified with 404
DNA Clean & Concentrator-5 (Zymo Research) and electroporated into E. coli EC100D-pir+ 405
(Lucigen). 406
DART vectors were barcoded by BsmBI (NEB) Golden Gate insertion of random barcode 407
PCR product into the 2xBsmBI barcode placeholder using a previously reported method4 with 408
slight modifications. A 56-nt ssDNA oligonucleotide encoding a central tract of 20 degenerate 409
nucleotides (oBFC1397) was amplified with BsmBI-encoding primers oBFC1398 and oBFC1399 410
using Q5 High-Fidelity 2X Master Mix (NEB) in a six-cycle PCR (98°C for 1 min; six cycles of 98°C 411
for 10 s, 58°C for 30 s, and 72°C for 60 s; and 72°C for 5 min). Barcoding Golden Gate reactions 412
were purified with DNA Clean & Concentrator-5. To remove residual non-barcoded vector, 413
reactions were digested with 15 U BsmBI at 55°C for at least 4 hr, heat inactivated at 80°C for 20 414
min, treated with 10 U Plasmid-Safe ATP-Dependent DNase (Lucigen) exonuclease at 37°C for 415
1 hr, heat inactivated at 70°C for 30 min, and purified with DNA Clean & Concentrator-5. 416
Randomly barcoded conjugative vectors were electroporated into E. coli EC100D-pir+, 417
followed 1 hr recovery in 1 mL pre-warmed SOC (NEB) at 37°C 250 rpm, serial dilution and spot 418
plating on LB agar plus 100 μg mL-1 carbenicillin to estimate library diversity, and plating the full 419
transformation across 5 LB agar plates containing carbenicillin (and other appropriate antibiotics 420
when Tn cargo contained other resistance cassettes). To prepare barcoded conjugative vector 421
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
22
plasmid stock, all 5 agar plates were scraped into a single pool and midiprepped (Zymo 422
Research). All conjugations were performed using the diaminopimelic acid (DAP) auxotrophic 423
RP4 conjugal donor E. coli strain WM3064. Donor strains were prepared by electroporation with 424
200 ng barcoded vectors, followed by recovery in SOC plus DAP at 37°C and 250 rpm and 425
inoculation of the entire recovery culture into 15 mL LB containing DAP and carbenicillin in 50 mL 426
conical tubes, followed by overnight cultivation at 37°C and 250 rpm. Donor serial dilutions were 427
spot plated on LB agar plus carbenicillin to estimate final barcode diversity. 428
429
Guide RNA design 430
In all experiments, VcCasTn gRNAs used 32 nt spacers and a 5’-CC Type IF PAM, while 431
ShCasTn gRNAs used 23 nt spacers and a 5’-GTT Cas12k PAM. All gRNAs were designed to 432
bind in the first half of the target CDS to ensure functional knockout by transposon insertion 433
(Supplementary Table 3). Off-target potential was assessed using BLASTn (-dust no -word_size 434
4) of spacers against a local BLAST database created from all genomes present in an experiment, 435
and spacers were discarded if off-target hits with E-value < 15 were identified. gRNAs with less 436
seed region complementarity to off-targets were prioritized. Non-targeting gRNAs were designed 437
by scrambling the spacer until no significant matches were found. 438
439
Delivery methods 440
For natural transformation and electroporation, a culture of the community or isolate to be 441
transformed was subcultured at OD600 = 0.2 and grown to OD600 = 0.5. In the case of the 442
thiocyanate-degrading bioreactor in the absence of accurate OD measurements the culture was 443
outgrown for two hours. For natural transformation 200 ng of vector harboring the mariner 444
transposon (pHLL2501) for non-targeted insertion, or water for the negative control were added 445
to 4 mL of OD600 = 0.5 outgrowth. Cultures were incubated for 3 hours shaking at 250 rpm at 446
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
23
temperature appropriate for the isolate or community before being moved to the appropriate 447
downstream analysis. 448
For electroporation, 20 mL of the community or isolate at OD600 = 0.5 was put on ice, 449
centrifuged at 4,000g at 4°C for 10 minutes, and washed four times with 10 mL sterile ice-cold 450
Milli-Q H2O. After a final centrifugation the pellet was resuspended in 100 μL of 2 ng/μL vector 451
(pHLL250 or VcDART), or 100 μL of water as a negative control. This solution was then pipetted 452
into a 0.2 cm gap ice-cold cuvette and electroporated at 3 kV, 200Ω, and 25 μF. The cells were 453
immediately recovered into 10 mL of the community’s or isolate’s preferred medium and incubated 454
shaking for 3 hours before being moved to the appropriate downstream analysis. 455
E. coli strain WM3064 containing the mariner transposon (pHLL250) for non-targeted 456
editing, or the VcDART for targeted editing was cultured overnight in LB supplemented with 457
carbenicillin (100 μg/mL) and DAP (60 μg/mL) at 37°C. Before conjugation the donor strain was 458
washed twice in LB (centrifugation at 4,000g for 10 minutes) to remove antibiotics. Then, 1 459
OD600*mL of the donor was added to 1 OD600*mL of the recipient community or isolate and the 460
mixture was plated on a 0.45 μm mixed cellulose ester membrane (Millipore) topping a plate of 461
the recipient’s preferred media without DAP. In the case of the thiocyanate-degrading bioreactor, 462
~2 OD600*mL of the donor was added to 2 OD600*mL of the recipient community to ensure 463
sufficient material despite the community's slow growth. Plates were incubated at the ideal 464
temperature for the recipient community or isolate for 12 hours before the growth was scraped off 465
the filter into the media of the recipient community or isolate for downstream analysis. 466
467
ET-Seq library preparation 468
The insertion junction sequencing library prep strategy for ET-Seq can be used (modification may 469
be necessary) in any circumstance where high efficiency mapping of inserted DNA to a host loci 470
is desired. For our purposes, DNA of the edited community or isolate was first extracted using 471
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
24
the DNeasy PowerSoil Kit (QIAGEN). In the case of the nine-member community, 500 ng of DNA 472
was used for both insertion junction sequencing and metagenomic library prep. For the SCN 473
community, which had lower yields of DNA, 100 ng were used. As an internal standard, DNA 474
from a previously constructed mutant library of Bacteroides thetaiotaomicron VPI-54825, a 475
species not present in the nine-member community or the thiocyanate-degrading bioreactor, was 476
spiked into the community DNA at a ratio of 1/500 by mass. The B. thetaiotaomicron library had 477
undergone antibiotic selection for its transposon insertions and was thus assumed to represent 478
100% transformation efficiency (i.e. every genome contained at least one mariner transposon 479
insertion). 480
For metagenomic sequencing, library prep was conducted by the standard ≥100 ng 481
protocol from the NEBNext Ultra II FS DNA Library Prep Kit for Illumina (NEB). For insertion 482
junction sequencing, the same protocol was used with a number of modifications enumerated 483
here (Extended Data Fig. 1). This insertion junction sequencing protocol has also been tested 484
successfully with the ≤ 100 ng protocol of the NEBNext Ultra II FS DNA Library Prep Kit (NEB) 485
and the KAPA HyperPlus Kit (Roche). For fragmentation an 8 minute incubation was used. A 486
custom splinklerette adaptor was used during adaptor ligation to decrease non-specific 487
amplification (Supplementary Table 4)6,7. For size selection 0.15X (by volume) SPRIselect 488
(Beckman Coulter, Cat # B23318) or NEBNext Sample Purification Beads (NEB) were used for 489
the first bead selection and 0.15X (by volume) were added for the second. From this selection, 490
the DNA was eluted in 44 μL (instead of the suggested 15 μL) where it undergoes digestion before 491
enrichment to cleave intact transposon delivery vector. All bead elutions were performed with 492
Sigma Nuclease-Free water. pHLL250 underwent AseI digestion, while DART vectors underwent 493
double digestion by AsiSI and SbfI-HF (NEB) (Supplementary Table 2). The DNA then underwent 494
a sample purification using 1X AMPure XP beads (Beckman Coulter) to prepare it for PCR 495
enrichment. 496
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
25
In PCR enrichment, the transposon junction was amplified by nested PCR. The PCRs 497
followed the NEBNext Ultra II FS DNA Library Prep Kit for Illumina (NEB) PCR protocol, however 498
in the first PCR the primers were custom to the transposon and the adaptor and the PCR was run 499
for 25 cycles (Supplementary Table 4). The enrichment then underwent sample purification with 500
a 0.7X size selection using SPRIselect or NEBNextSample Purification Beads from which 15 μL 501
were eluted for the second PCR. This second PCR used custom unique dual indexing primers 502
specific to nested regions of the insertion and adaptor and 6 cycles are used (Supplementary 503
Table 4). Then another 0.7x size selection was conducted and the final library was eluted in 30 504
μL. Samples for metagenomic sequencing and insertion junction sequencing were then quality 505
controlled and multiplexed using 1X HS dsDNA Qubit (Thermo Fisher) for total sample 506
quantification, Bioanalyzer DNA 12000 chip (Agilent) for sizing, and qPCR (KAPA) for 507
quantification of sequenceable fragments. Samples were sequenced on the iSeq100 or 508
HiSeq4000 platforms. 509
510
Genome sequencing, assembly, taxonomic classification, and database construction 511
For a full list of genome sequences used as read mapping references in this study see 512
Supplementary Table 1. Assembly and annotation of genomes used as references for the SCN 513
bioreactor experiment is described in Huddy et al.8. As the SCN bioreactor has been subjected to 514
numerous genome-resolved metagenomic studies8,9 we endeavored to create a non-redundant 515
database that contained all genomes previously observed in this reactor system. A set of 556 516
genomes assembled from this system were de-replicated at the species level using dRep v2.5.310 517
with an average nucleotide identity (ANI) threshold of 95% and a minimum completeness of 60% 518
as estimated by checkM v1.1.211. A single genome representing each species level group was 519
chosen by dRep based on optimizing genome size, fragmentation, estimated completeness, and 520
estimated contamination resulting in 265 representative genomes. Genomes were taxonomically 521
classified using GTDB-Tk12 with default options. Additionally, to display the taxonomic diversity of 522
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
26
transformed organisms (Fig. 3b), a phylogenetic tree was constructed using ribosomal protein S3 523
(rpS3). Briefly, a custom Hidden Markov Model (HMM) was used to identify rpS3 sequences13 in 524
the 265 representative genomes, and 262 rpS3 sequences were successfully identified. 525
Sequences were aligned using muscle14, and a maximum likelihood phylogenetic tree was 526
constructed using IQ-TREE15. Phylogenetic trees were pruned and annotated using iTol v5 527
(https://itol.embl.de/). To determine if an organism we identified as receiving exogenous DNA was 528
ever previously isolated we used the ANI relative to the closest reference genome in the GTDB-529
Tk output. If one of our genomes had an ANI ≥ 95% relative to a known reference, and this 530
reference genome was generated from an isolated bacterium, our target organism was 531
considered to be previously isolated (Supplementary Table 1). 532
For 9-member community genomes assembled as part of this study, cultures were grown 533
on R2A medium for 24 hours at 30°C and genomic DNA was extracted with the DNeasy Blood 534
and Tissue DNA Kit (Qiagen) with pre-treatment for Gram-positive bacteria. Genomic DNA was 535
sheared mechanically with the Covaris S220 and processed with the NEBNext DNA Library Prep 536
Master Mix Set for Illumina (NEB) before submitting for sequencing on an Illumina MiSeq platform 537
generating paired end 150 bp reads. Raw sequencing reads were processed to remove Illumina 538
adapter and phiX sequence using BBduk with default parameters, and quality trimmed at 3’ ends 539
with Sickle using default parameters (https://github.com/najoshi/sickle). Assemblies were 540
conducted using IDBA-UD v1.1.116 with the following parameters: –pre_correction –mink 30 –541
maxk 140 –step 10. Following assembly, contigs smaller than 1 kbp were removed and open 542
reading frames (ORFs) were then predicted on all contigs using Prodigal v2.6.317. 16S ribosomal 543
rRNA genes were predicted using the 16SfromHMM.py script from the ctbBio python package 544
using default parameters (https://github.com/christophertbrown/bioscripts). Transfer RNAs were 545
predicted using tRNAscan-SE18. The full metagenome samples and their annotations were then 546
uploaded into our in-house analysis platform, ggKbase, where genomes were manually curated 547
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
27
via the removal of contaminating contigs based on aberrant phylogenetic signatures 548
(https://ggkbase.berkeley.edu). 549
For each ET-Seq experiment a genomic database is constructed using the ETdb 550
component of the ETsuite software package. Each database contains the nucleotide sequences 551
of the expected organisms in a sample, any vectors used, any conjugal donor, and the spike in 552
control organism. Briefly, all genomic sequences are formatted into a bowtie2 index to allow read 553
mapping, a tabular correspondence table between all scaffold names and their associated 554
genome is constructed (scaff2bin.txt), and a table (genome_info.txt) of standard genomic 555
statistics is calculated including genome size, GC content, and number of scaffolds. Following 556
database construction, a label is manually added to each entry in the genome info table to indicate 557
if the entry represents a target organism, a vector, or a spike in control organism. All data are 558
propagated into a single folder that can be used by the ETmapper software for downstream 559
mapping and analysis. 560
561
Identification and quantification of insertion junctions and barcodes 562
To identify and map transposon insertion junctions and their associated barcodes in a mixed 563
population of microbial cells, reads (150 bp X 2) generated from PCR amplicons of putative 564
transposon insertion junctions are first processed using the ETmapper component of the ETsuite 565
software package implemented in R with the following steps: First reads are quality trimmed at 566
the 3’ end to remove low quality bases (Phred score ≤ 20) and sequencing adapters using 567
Cutadapt v2.1019. Cutadapt is then used to identify and remove provided transposon model 568
sequences from the 5’ end of forward reads, requiring a match to 95% of the shortest transposon 569
sequence in a provided set and allowing a 2% error rate. Read pairs where no transposon model 570
sequence is identified in the forward read are discarded. All identified and trimmed transposon 571
models are paired with their respective reads, stored, and barcodes are identified in these 572
sequences by searching for a known primer binding site sequence flanking the 5’ end of the 573
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
28
barcode (5’-CTATAGGGGATAGATGTCCACGAGGTCTCT-3’) allowing for 1 mismatch. 574
Subsequently, the 20 bp region following the known primer binding site is extracted as the barcode 575
sequence and associated with its respective read. The 3’ end of the paired reverse reads are then 576
trimmed to remove any transposon model sequence using Cutadapt, and only read pairs where 577
one mate is at least ≥ 40 bp following all trimming are retained for downstream mapping and 578
analysis. The fully trimmed paired end reads now consisting of only genomic sequence following 579
the transposon insertion site are mapped to the ETdb database used in a given experiment using 580
bowtie2 with default options20. Mapped read files are converted into a hit table indicating the 581
mapped genome, scaffold, genomic coordinates, mapQ score, and number of alignment 582
mismatches for each read in a pair using a custom Python script, bam_pe_stats.py, provided with 583
ETsuite. This table is then merged with read-barcode assignments to generate a final hit table 584
with the mapping information about each read pair, the transposon model identified, and the 585
associated barcode found for that read pair. Finally mapped read pairs are only retained for 586
downstream quantification if both reads map to the same genome, at least one mapped read in a 587
pair has a mapQ score ≥ 20, and a barcode was successfully identified and associated with the 588
read pair. 589
To quantify the number of unique barcodes and their associated reads mapping to 590
organisms in each sample of an experimental run, the filtered hit tables were processed using the 591
ETstats component of the ET-Seq software package with the following steps: Initially, all barcodes 592
identified across all samples in an experiment are aggregated and clustered using Bartender21 593
with the following supplied options: -l 4 -s 1 -d 3. Barcode clusters and their associated 594
barcodes/reads were only retained if all of the following criteria were true: (1) ≥ 75% of the reads 595
in a cluster mapped to one genome (the majority genome), (2) ≥ 75% of the reads in a cluster 596
were associated with the same transposon model (the majority model), and (3) the barcode 597
cluster had at least 2 reads. Subsequently, when quantifying reads and barcodes in each sample 598
of an experiment, the genome a read was mapped to and the transposon model it was associated 599
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
29
with had to agree with the majority assignments for the barcode cluster assigned to that read’s 600
barcode to be counted. Finally, we were aware that Illumina patterned flow cell related index 601
swapping would result in reads from a barcode cluster being misassigned across samples, even 602
when using unique dual indexing22. We could not simply limit barcode clusters to be associated 603
with only one sample, as our spike in control organisms contain the same pool of barcodes and 604
are added to every sample. Thus we estimated an empirical index swap rate across each 605
experiment and required that the number of reads (X) for a barcode to be positively identified in 606
a sample be always ≥ 2 and ≥ the binomial mean of observed read counts expected in any sample 607
for a barcode cluster with (R) reads across (N) samples based on the estimated swap rate (S) + 608
2 standard deviations (Eqn. 1) 609
610
Eqn. 1: 𝑋 ≥ $𝑅 × '!"() + 2 × ,𝑅 × (1 − 𝑆) × 𝑆&𝑋 ≥ 2 611
612
The index swap rate for an experiment was empirically estimated from barcode clusters assigned 613
only to target organisms based on the assumption that it would be highly unlikely for a barcode 614
cluster to have truly originated from independent integration events into the same organism in 615
more than one sample. Thus we assumed that for each barcode cluster associated with target 616
organisms, the majority of reads originated from the true sample and reads assigned to other 617
samples represented swaps. This is opposed to barcode clusters associated with our spike-in 618
organism, conjugal donor organism, or vectors which contain the same pool of barcodes directly 619
added to multiple samples. To identify swapped read counts we first quantify the total count of all 620
reads assigned to the majority genome across barcode clusters but that are not associated with 621
the majority sample of that cluster (E). Then we quantify the total count of reads associated with 622
the majority genome and associated with the majority sample across all clusters (C). Then 623
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
30
experiment wide swap rate was estimated by dividing the total number of reads not associated 624
with majority samples by the total number of reads (Eqn. 2) 625
626
Eqn. 2: 𝑆 = #(#&')
627
628
Following filtering, a hit table is returned that indicates for each genome in each sample, the 629
number of unique barcode clusters that were recovered, and the total number of reads associated 630
with these barcodes. As a final check for false positives during ET-Seq development we included 631
an organism genome as an ETsuite mapping target, Sinorhizobium meliloti, which was not 632
physically included in our 9-member synthetic or thiocyanate-degrading communities. We did not 633
detect any barcodes or reads associated with this genome. 634
635
Metagenomic data processing and coverage calculation. 636
Each ET-Seq sample is split and in parallel undergoes shotgun metagenomic sequencing to 637
determine the relative quantities of organisms present in the sample at the time of sampling. Raw 638
read files from metagenomic data are also processed using the ETmapper component of the 639
ETsuite software package with the following steps: First reads are quality trimmed at the 3’ end 640
to remove low quality bases (Phred score ≤ 20) and sequencing adapters using Cutadapt v2.1019. 641
Read pairs where at least one mate is not ≥ 40 bp in length are discarded. Trimmed read pairs 642
are mapped to the ETdb database used in a given experiment using bowtie220 with default 643
parameters. Mappings are filtered to require a minimum identity ≥ 95% and minimum mapQ score 644
≥ 20, and coverage is calculated using a custom script, calc_cov.py, included with the ETsuite 645
software. 646
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
31
Metagenomic sequencing for one biological replicate of the thiocyanate-degrading bioreactor 647
community (Conjugation - Control Sample - Replicate 3) failed. Metagenomic coverage values for 648
this replicate were generated by averaging the values from the other two biological replicates. 649
650
ET-Seq normalization and calculation of insertion efficiency. 651
To account for differences in sequencing depth, transposon junction PCR template amount, and 652
relative abundance of microbes in a community the data generated from both ET-Seq and 653
shotgun metagenomics were each normalized independently to values from the spike in control 654
organism, B. thetaiotaomicron, and then ET-Seq data is subsequently normalized by 655
metagenomic abundance as follows: Initially read count tables from ET-Seq and metagenomics 656
are filtered to remove any ET-Seq read count associated with < 2 barcodes and any metagenomic 657
read count < 10 reads. Next a size factor for each sample is calculated based on the geometric 658
mean of B. thetaiotaomicron reads for ET-Seq samples and B. thetaiotaomicron coverage for 659
metagenomics samples. ET-Seq read counts and metagenomic coverage values are then divided 660
by their respective sample size factors to create normalized values. Normalized ET-Seq read 661
counts are then divided by their paired normalized metagenomic coverage values to generate ET-662
Seq read counts that are fully normalized to both ET-Seq sequencing depth and metagenomic 663
coverage. Finally fully normalized ET-Seq read counts for target organisms are divided by the 664
fully normalized ET-Seq read count of B. thetaiotaomicron from an experiment (a constant that 665
represents the number of reads that would be obtained from an organism with 100% of its 666
chromosomes carrying insertions). The resulting values for each target organism in a sample 667
represent an estimate of the fraction of that organism’s population that received insertions 668
(Insertion Efficiency). Additionally, we multiply a target organism’s insertion efficiency by the 669
fractional relative abundance of that organism in a sample, based on metagenomic data, to 670
estimate the fraction of an entire sample population that is made up of cells of a given species 671
that received insertions (Insertion-Receiving Fraction in Total Community). 672
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
32
673
ET-Seq validation and establishing limits of detection and quantification. 674
To validate ET-Seq and establish both a limit of detection (LOD) and limit of quantification (LOQ) 675
for the assay, a library of K. michiganensis transposon mutants was constructed by antibiotic 676
selection following conjugation with pHLL250 (as described above), and this library was added to 677
untransformed samples of the combined nine-member community to create a transformed cell 678
concentration gradient. Technical triplicate samples were created where 1%, 0.1%, 0.01%, 679
0.001% and 0% of the total K. michiganensis cells (by OD600) in the mixture were those derived 680
from the transformed library. All samples (n = 15) were subjected to ET-Seq (as described above), 681
and pooled samples across all concentrations for each technical triplicate (n = 3; 5 concentrations) 682
were analyzed for community composition using shotgun metagenomics (as described above). 683
ET-Seq insertion efficiencies and insertion-receiving fraction in total community values were 684
averaged across technical replicates. Additionally, to derive the fraction of transformed K. 685
michiganensis cells that made up the total community (not just the K. michiganensis sub-686
population), the known fraction of K. michiganensis cells that were transformed in a sample was 687
multiplied by the measured relative abundance of K. michiganensis in a given technical replicate, 688
and these values were averaged across technical replicates. 689
To derive the LOD and LOQ for ET-Seq a linear regression was performed using the lm 690
function in the base package of R23 using the known fraction of transformed K. michiganensis 691
cells that made up the total community as the independent variable and the ET-Seq estimated 692
per community insertion efficiency as the dependent variable. The sample where transformed K. 693
michiganensis made up 0% of the community was not included in the regression analysis, but 694
was reserved to demonstrate zero response with no transformed cells present. LOD was 695
calculated as 3.3 * standard error of the regression / slope. The LOQ was calculated as 10 * 696
standard error of the regression / slope. 697
698
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
33
Identification of positive transformations and statistical analysis 699
For all ET-Seq experiments conducted we initially determined if any ET-Seq estimated per 700
community insertion efficiency was larger than the LOD. Values larger than the LOD constituted 701
a positive detection. For comparative statistical analysis conducted to compare insertion 702
efficiencies between transformation methods (Fig. 2b) only values that had a corresponding 703
insertion-receiving fraction in total community > LOQ were used. Statistical testing was conducted 704
using Analysis of Variance (ANOVA) implemented in the aov function in R23. Post-hoc testing was 705
conducted using the TukeyHSD function in R. Traditional 95% confidence intervals were 706
calculated using the groupwiseMean function of the rcompanion package in R. 707
708
Multiple delivery experiments in communities. 709
To test multiple delivery methods on the nine-member community, all members were grown at 710
30°C with Bacillus sp. AnTP16 and Methylobacterium sp. UNC378MF in R2A liquid media while 711
all other members were inoculated in LB. Equal amounts of community members were then 712
combined by OD600. This consortium then underwent transformation (of pHLL250), conjugation 713
(pHLL250 in WM3064), and electroporation of the pHLL250 vector (described in Delivery Methods 714
section). After delivery the community was spun down at 5,000g for 10 minutes, washed once 715
with LB and then spun down and frozen at -80°C until genomic DNA extraction. 716
The thiocyanate-degrading microbial community was sampled for delivery testing from biofilm on 717
a four liter continuously stirred tank reactor that had been maintained at steady state for over a 718
year. The reactor is operated with a two day hydraulic residence time, sparged with laboratory air 719
at 0.9 L/min, and fed with a mixture of molasses (0.15% w/v), thiocyanate (250 ppm), and KOH 720
to maintain pH 7. OD measurements were not feasible on the biofilm so we used its wet mass to 721
approximate equivalent OD and thus cell numbers to those used for the nine-member community. 722
This community underwent the same transformation, electroporation, and conjugation delivery 723
approaches as the nine-member community, however in all steps requiring media, LB was 724
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
34
replaced with molasses media (no thiocyanate). After delivery the community was spun down at 725
5,000g for 10 minutes, washed once with molasses media and then spun down and frozen at -726
80°C until genomic DNA extraction. 727
728
Benchmarking DART systems in E. coli. 729
We first constructed several DART systems to identify variants capable of efficient transposition 730
by conjugative delivery to E. coli. We performed parallel conjugation of each DART vector variant 731
containing GmR Tn cargo (2.1 kbp) and either a non-targeting gRNA or one of two lacZ-targeting 732
gRNAs for each system. For VcDART, variation of the promoter controlling the expression of 733
VcCasTn components did not significantly impact transposition efficiency (Extended Data Fig. 4c-734
d). Similarly for ShDART, expression of the sgRNA in three distinct transcriptional configurations 735
did not significantly impact transposition efficiency (Extended Data Fig. 4e-f). Since promoter and 736
transcriptional configuration variation had insignificant effects on transposition efficiency--and to 737
remove the requirement for promoter induction and reliance on T7 RNA polymerase--we 738
performed target specificity benchmarking of VcDART and ShDART using the same constitutive 739
Plac promoter. In this experiment, ShDART Cas and Tns genes and sgRNA were encoded in the 740
original transcriptional configuration and under control of the same promoter in which ShCasTn 741
was first characterized by Strecker et al.3. 742
The lacZ-targeting gRNAs were designed to target the lacZ α-peptide present in the 743
conjugation recipient strain E. coli BL21(DE3) but absent in the lacZΔM15 strains used as cloning 744
host (E. coli EC100D-pir+) or conjugation donor (E. coli WM3064), preventing transposition until 745
delivery into the recipient cell (Extended Data Fig. 4a). Donor WM3064 strains were transformed 746
and cultivated as described above, and recipient BL21(DE3) was inoculated from glycerol stock 747
into 100 mL LB in a 250 mL baffled shake flask at 37°C 250 rpm. Conjugations were performed 748
as described above using LB medium and 37°C incubation for every step, except that 0.1 mM 749
IPTG was added to VcDART conjugation plates in Extended Data Fig. 4d to induce transcription 750
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
35
from PT7-lac and T7 RNA polymerase expression in E. coli BL21(DE3). Transposition efficiencies 751
were calculated as the percentage of colonies resistant to 10 μg mL-1 gentamycin relative to viable 752
colonies in absence of gentamycin. 753
On/off-target analysis was performed for one lacZ-targeting guide for each DART system 754
by outgrowth under selection followed by genomic DNA extraction and ET-Seq. Specifically, 755
approximately 10,000 transconjugant cfu were plated on LB agar with gentamycin, incubated at 756
37°C overnight, scraped from agar into liquid LB medium, diluted to OD600 = 0.25 into 10 mL LB 757
plus gentamycin in 50 mL conical tubes, incubated at 37°C 250 rpm until OD600 = 1.0, centrifuged 758
at 4,000g, and frozen for downstream analysis. To determine the percent of selectable transposed 759
colonies possessing on-target and off-target edits, the total number of selectable colonies was 760
adjusted (Extended Data Fig. 4b) for on-target and off-target percent as determined by ET-Seq 761
(Fig. 4b). ET-Seq analysis was conducted on triplicate platings of DART transconjugants (n = 3 762
for each system) to identify transposon insertion locations and quantify on-target vs. off-target 763
insertions. As the targeted genomic region encoding the lacZ α-peptide is duplicated in E. coli 764
BL21(DE3), one of the two duplicated regions (749,903 bp --> 750,380 bp) was removed prior to 765
analysis to allow unambiguous mapping assignment. Subsequently, the standard ETsuite 766
analysis pipeline (as described above) was used to identify and map 300 bp X 2 reads containing 767
transposon junctions back to the recipient BL21(DE3) genome and cluster barcodes that 768
corresponded to unique insertion events. To confirm an insert location we first identified the exact 769
transposon-genome junction mapping coordinate that was the most frequent in the reads of a 770
barcode cluster (prime location) then required that a barcode cluster had: (1) at least 75% of its 771
reads coming from within 3 bp of the prime location and (2) at least 75% of its reads mapping to 772
the same strand. If these criteria were true the barcode cluster was counted as a unique insertion 773
and the prime location was used as the mapping locus by ET-Seq. An on-target insertion was 774
evaluated as a barcode cluster with a prime location within 200 bp downstream of the 3’ end of 775
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
36
the protospacer target. Finally all distances reported from the protospacer target site were 776
calculated from the last base pair of the 3’ end of the protospacer. 777
778
VcDART-mediated targeted editing in a community. 779
VcDART vectors encoding constitutive VcCasTn, constitutive bla:aadA Tn cargo (2.7 kbp), and 780
either a non-targeting (pBFC0888), K. michiganensis M5aI pyrF-targeting (pBFC0825), or P. 781
simiae WCS417 pyrF-targeting (pBFC0837) constitutive crRNA were transformed into E. coli 782
WM3064. Conjugations of these vectors into the nine-member community were performed as 783
described above on filter-topped LB agar plates with 12 hr incubation at 30°C. Lawns were 784
scraped from filters into 10 mL LB medium, vortexed, and 1 OD600*mL from each lawn was plated 785
on LB agar supplemented with 1 mg mL-1 5-FOA, 100 μg mL-1 carbenicillin, 100 μg mL-1 786
streptomycin, and 100 μg mL-1 spectinomycin. Following 3 days of incubation at 30°C, all cells 787
were scraped from the agar into 10 mL R2A medium, vortexed, diluted into 10 mL R2A 788
supplemented with 20 mg mL-1 uracil (for no selection controls) or R2A with uracil, 5-FOA, 789
carbenicillin, streptomycin, and spectinomycin to OD600 = 0.02, and split evenly across 4 wells 790
(2.5 mL/well) of a 24 deep well plate. After cultivation at 30°C and 750 rpm for 1 week, only the 791
cultures conjugated with VcDART containing pyrF-targeting crRNA had grown in presence of 792
antibiotics and 5-FOA. A small portion of each of these cultures was serially diluted in R2A and 793
plated on LB agar plus antibiotics to isolate and assay colonies by targeted PCR and Sanger 794
sequencing of pyrF loci. The remainder of each culture was centrifuged at 4,000g for 10 min and 795
frozen at -80°C for downstream bacterial 16S rRNA V4 amplicon metagenomic sequencing 796
(Novogene). Relative abundances were calculated as described below (16S rRNA V4 amplicon 797
analysis) for pre-conjugation nine-member community cultures and post-selection pyrF-targeted 798
cultures. 799
800
16S rRNA V4 amplicon analysis 801
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
37
16S rRNA V4 amplicon sequencing was conducted using the 515F (5’-802
GTGCCAGCMGCCGCGGTAA-3’) and 806R (5’-GGACTACHVGGGTWTCTAAT-3’) universal 803
bacterial primer set to generate 250 bp x 2 reads (Novogene). Samples were processed using 804
the UPARSE pipeline within the USEARCH software package to merge read pairs, remove 805
primers, quality filter sequences, remove chimeras, identify unique sequence variants (ZOTUs), 806
and quantify their abundance across samples as described previously24. To assign ZOTU 807
sequences to species known to be in our community mixture, we queried all 890 identified ZOTUs 808
against a custom database of 16S sequences derived from the genomes of the nine-member 809
community constituents using USEARCH25 ZOTUs with 100% identity to a 16S sequence in our 810
database were assigned to the matched species, and all matches < 100% identity were counted 811
as “Other”. Counts coming from ZOTUs of the same taxonomic assignment were merged and the 812
relative abundance of a species was calculated as its read counts divided by the total read counts 813
for the sample. 814
815
Statistics and reproducibility 816
All transformations (natural transformation, conjugation, electroporation) and subsequent 817
analyses were performed for three independent replicates. 818
819
Reporting summary 820
Further information on research design is available in the Nature Research Reporting Summary 821
linked to this paper. 822
823
Data availability 824
Summary data for genomes, plasmids, and oligonucleotides used in this study can be found in 825
Supplementary tables 1-4. Sequence data for all genomes assembled as part of this study and 826
newly constructed plasmids are in submission to NCBI with accession numbers pending. 827
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
38
Sequence data for genomes taken from Huddy, et al8 are in submission to NCBI with accession 828
numbers pending. Sequence data for genomes taken from Kantor, et al9 are available under NCBI 829
BioProject accession no. PRJNA279279. All genomes and plasmids used in the project will also 830
be made available on ggKbase (https://ggkbase.berkeley.edu/). Raw count data for all 831
experiments including both metagenome and ET-seq information is available at 832
https://github.com/SDmetagenomics/ETsuite/tree/master/manuscript_data. 833
834
Code availability 835
Custom R scripts for ET-Seq analysis and code used in the construction of figures are available 836
at https://github.com/SDmetagenomics/ETsuite. 837
838
Methods references 839
1. Adler, B. A. et al. Systematic Discovery of Salmonella Phage-Host Interactions via High-840
Throughput Genome-Wide Screens. doi:10.1101/2020.04.27.058388. 841
2. Klompe, S. E., Vo, P. L. H., Halpin-Healy, T. S. & Sternberg, S. H. Transposon-encoded 842
CRISPR–Cas systems direct RNA-guided DNA integration. Nature 571, 219–225 (2019). 843
3. Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. 844
Science 365, 48–53 (2019). 845
4. Liu, H. et al. Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria. 846
mSystems 3, (2018). 847
5. Liu, H. et al. Large-scale chemical-genetics of the human gut bacterium Bacteroides 848
thetaiotaomicron. BioRxiv (2019). 849
6. Devon, R. S., Porteous, D. J. & Brookes, A. J. Splinkerettes--improved vectorettes for 850
greater efficiency in PCR walking. Nucleic Acids Res. 23, 1644–1645 (1995). 851
7. Barquist, L. et al. The TraDIS toolkit: sequencing and analysis for dense transposon mutant 852
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
39
libraries. Bioinformatics 32, 1109–1111 (2016). 853
8. Huddy, R. J. et al. Thiocyanate and organic carbon inputs drive convergent selection for 854
specific autotrophic Afipia and Thiobacillus strains within complex microbiomes. bioRxiv 855
2020.04.29.067207 (2020) doi:10.1101/2020.04.29.067207. 856
9. Kantor, R. S. et al. Genome-Resolved Meta-Omics Ties Microbial Dynamics to Process 857
Performance in Biotechnology for Thiocyanate Degradation. Environ. Sci. Technol. 51, 858
2944–2953 (2017). 859
10. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate 860
genomic comparisons that enables improved genome recovery from metagenomes through 861
de-replication. ISME J. 11, 2864–2868 (2017). 862
11. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: 863
assessing the quality of microbial genomes recovered from isolates, single cells, and 864
metagenomes. Genome Res. 25, 1043–1055 (2015). 865
12. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify 866
genomes with the Genome Taxonomy Database. Bioinformatics (2019) 867
doi:10.1093/bioinformatics/btz848. 868
13. Diamond, S. et al. Mediterranean grassland soil C-N compound turnover is dependent on 869
rainfall and depth, and is mediated by genomically divergent microorganisms. Nat Microbiol 870
4, 1356–1367 (2019). 871
14. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high 872
throughput. Nucleic Acids Res. 32, 1792–1797 (2004). 873
15. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective 874
stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 875
268–274 (2015). 876
16. Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for 877
single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 878
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
40
1420–1428 (2012). 879
17. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site 880
identification. BMC Bioinformatics 11, 119 (2010). 881
18. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA 882
genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997). 883
19. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. 884
EMBnet.journal 17, 10–12 (2011). 885
20. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 886
9, 357–359 (2012). 887
21. Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to 888
count barcode reads. Bioinformatics 34, 739–747 (2018). 889
22. Costello, M. et al. Characterization and remediation of sample index swaps by non-890
redundant dual indexing on massively parallel sequencing platforms. BMC Genomics 19, 891
332 (2018). 892
23. R Core Team. R: A language and environment for statistical computing. (2013). 893
24. Edgar, R. C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. 894
Nat. Methods 10, 996–998 (2013). 895
25. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 896
26, 2460–2461 (2010). 897
898
Acknowledgments 899
We thank Morgan N. Price for data analysis input, Patrick Pausch for experimental advice, Shana 900
L. McDevitt, Eileen Wagner, and Hitomi Asahara for help with sequencing, and Trent R. Northen 901
for directional advice. Funding was provided by m-CAFEs Microbial Community Analysis & 902
Functional Evaluation in Soils, ([email protected]) a project led by Lawrence Berkeley National 903
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
41
Laboratory supported by the U.S. Department of Energy, Office of Science, Office of Biological & 904
Environmental Research under contract number DE-AC02-05CH11231. Support was also 905
provided by the Innovative Genomics Institute at UC Berkeley. B.E.R. and B.F.C. are supported 906
by the National Institute of General Medical Sciences of the National Institute of Health under 907
award numbers F32GM134694 and F32GM131654. Schematics were created with 908
BioRender.com. 909
910
Contributions 911
B.E.R., S.D., B.F.C, A.M.D., J.F.B, and J.A.D. conceived the work and designed the experiments. 912
B.E.R., B.F.C., C.H., M.X., Z.Z., D.C.S., K.T., T.K.O., and N.K. conducted the molecular biology 913
included. S.D., A.C.-C., C.H., and R.S. developed the bioinformatic analysis. B.E.R., S.D., B.F.C., 914
A.M.D., J.F.B., and J.A.D. analyzed and interpreted the data. 915
916
Competing Interests 917
The Regents of the University of California have patents pending related to this work on which 918
B.E.R., S.D., B.F.C., A.M.D., J.F.B., and J.A.D. are inventors. J.A.D. is a co-founder of Caribou 919
Biosciences, Editas Medicine, Intellia Therapeutics, Scribe Therapeutics and Mammoth 920
Biosciences, a scientific advisory board member of Caribou Biosciences, Intellia Therapeutics, 921
eFFECTOR Therapeutics, Scribe Therapeutics, Synthego, Mammoth Biosciences and Inari, and 922
is a Director at Johnson & Johnson and has sponsored research projects by Biogen, Roche and 923
Pfizer. J.F.B. is a founder of Metagenomi. 924
925
Additional Information 926
Correspondence and request for materials should be addressed to J.A.D. and J.F.B. 927
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
42
Extended Data 928
929
Extended Data Fig. 1 | Library preparation and data normalization for ET-Seq. a, ET-Seq requires low-930
coverage metagenomic sequencing and customized insertion sequencing. Insertion sequencing relies on 931
custom splinkerette adaptors, which minimize non-specific amplification, a digestion step for degradation 932
of delivery vector containing fragments, and nested PCR to enrich for fragments containing insertions with 933
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
43
high specificity. The second round of nested PCR adds unique dual index adaptors for Illumina sequencing. 934
b, This insertion sequencing data is first normalized by the reads to internal standard DNA which is added 935
equally to all samples and serves to correct for variation in reads produced per sample. Secondly, it is 936
normalized by the relative metagenomic abundances of the community members. 937
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
44
938
Extended Data Fig. 2 | ET-Seq determined insertion efficiencies for all nine consortium members as 939
a fraction of the entire community. ET-Seq determined insertion efficiencies for conjugation, 940
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
45
electroporation, and natural transformation on the nine-member synthetic community (n = 3 biological 941
replicates). The values shown are the estimated fraction a constituent species's transformed cells make of 942
the total community population. Control samples received no exogenous DNA. Average relative abundance 943
across all samples is indicated in parentheses (n = 18 independent samples). LOD and LOQ are indicated 944
in plots by short and long dashed lines respectively. 945
946
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
46
947
948
Extended Data Fig. 3 | ET-Seq determined insertion efficiencies for all thiocyanate-degrading 949
bioreactor community members as a fraction of the entire community. ET-Seq determined insertion 950
efficiencies for conjugation, electroporation, and natural transformation on the thiocyanate-degrading 951
bioreactor community (n = 3 biological replicates). The values shown are the estimated fraction a 952
constituent species’s transformed cells make of the total community population. Control samples received 953
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
47
no exogenous DNA. Average relative abundance across all samples is indicated in parentheses (n = 17 954
independent samples; due to a single failed metagenomic sequencing replicate, see methods). LOD and 955
LOQ are indicated in plots by short and long dashed lines respectively. 956
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
48
957
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint
49
Extended Data Fig. 4 | Benchmarking DART vectors. a, E. coli WM3064 to E. coli BL21(DE3) conjugation, 958
transposition, and selection schematic (top) and guide RNAs targeting the lacZ α-fragment of recipient BL21(DE3), 959
which is absent from donor WM3064 (bottom). b,d,f, Percent selectable transposed colonies is calculated as the 960
number of colonies obtained with gentamycin selection divided by total viable colonies in absence of selection. b, 961
Insertion receiving colonies divided into on- and off-targeted. This was calculated by multiplying % selectable colonies 962
for representative guides in d and f (highlighted by grey bars) by the on- or off-target rates (shown in Fig. 4). c, 963
Transposition with VcDART was tested with three promoters. The variant using the Plac promoter, harvested from 964
pHelper_ShCAST_sgRNA18, was also used for Fig. 4, 5, and Extended Data Fig. 4b (*). d, Efficiencies of VcDART 965
using various promoters. e, Transposition with ShDART was tested with three transcriptional configurations, all using 966
Plac18. The configuration used for characterization of ShCasTn originally18 was also used for Fig. 4 and Extended Data 967
Fig. 4b (*). f, Efficiencies of ShDART using various promoters. b, d, f, Crossbar indicates mean and error bars indicate 968
one standard deviation from the mean (n = 3 biological replicates). Guide RNAs ending in “NT” are non-targeting 969
negative control samples. 970
.CC-BY-NC 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 21, 2020. ; https://doi.org/10.1101/2020.07.17.209189doi: bioRxiv preprint