minimal pam specificity of a highly similar spcas9 ortholog · the pam for sccas9 by first mapping...

11
MICROBIOLOGY Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). Minimal PAM specificity of a highly similar SpCas9 ortholog Pranam Chatterjee 1,2 * , Noah Jakimo 1,2 *, Joseph M. Jacobson 1,2 RNA-guided DNA endonucleases of the CRISPR-Cas system are widely used for genome engineering and thus have numerous applications in a wide variety of fields. CRISPR endonucleases, however, require a specific protospacer adjacent motif (PAM) flanking the target site, thus constraining their targetable sequence space. In this study, we demonstrate the natural PAM plasticity of a highly similar, yet previously uncharacterized, Cas9 from Streptococcus canis (ScCas9) through rational manipulation of distinguishing motif insertions. To this end, we report affinity to minimal 5-NNG-3PAM sequences and demonstrate the accurate editing capabilities of the ortholog in both bac- terial and human cells. Last, we build an automated bioinformatics pipeline, the Search for PAMs by ALignment Of Targets (SPAMALOT), which further explores the microbial PAM diversity of otherwise overlooked Streptococcus Cas9 orthologs. Our results establish that ScCas9 can be used both as an alternative genome editing tool and as a functional platform to discover novel Streptococcus PAM specificities. INTRODUCTION RNA-guided endonucleases of the CRISPR-Cas system, such as Cas9 (1) and Cpf1 (also known as Cas12a) (2), have proven to be versatile tools for genome editing and regulation (3), which have numerous implications in medicine, agriculture, bioenergy, food security, nano- technology, and beyond (4). The range of targetable sequences is limited, however, by the need for a specific protospacer adjacent motif (PAM), which is determined by DNA-protein interactions, to im- mediately follow the DNA sequence specified by the single-guide RNA (sgRNA) (1, 58). For example, the most widely used variant, Streptococcus pyogenes Cas9 (SpCas9), requires a 5-NGG-3motif downstream of its RNA-programmed DNA target (1, 48). To relax this constraint, additional Cas9 and Cpf1 variants with distinct PAM requirements have been either discovered (913) or engineered (1215) to diversify the range of targetable DNA sequences. In total, these studies have provided only a handful of CRISPR effectors with mini- mal PAM requirements that enable wide targeting capabilities. To help augment this list, we characterize an orthologous Cas9 pro- tein from S. canis (ScCas9; UniProt I7QXF2), having 89.2% sequence similarity to SpCas9. We find that, despite this homology, ScCas9 prefers a more minimal 5-NNG-3PAM. To explain this divergence, we identify two notable insertions within its open reading frame (ORF) that differentiate ScCas9 from SpCas9 and contribute to its PAM-recognition flexibility. We show that ScCas9 can efficiently and accurately edit genomic DNA in mammalian cells and investigate possible explanations for PAM divergence between Streptococcus orthologs. Last, we construct a bioinformatics pipeline to explore the PAM specificities of other Streptococcus orthologs. RESULTS Identification of SpCas9 homologs While numerous Cas9 homologs have been sequenced, only a hand- ful of Streptococcus orthologs have been characterized or functionally validated. To explore this space, we curated all Streptococcus Cas9 pro- tein sequences from UniProt (16), performed global pairwise align- ments using the BLOSUM62 scoring matrix (17), and calculated percent sequence homology to SpCas9. From them, ScCas9 stood out, not only because of its remarkable sequence homology (89.2%) to SpCas9 but also due to a positive-charged insertion of 10 amino acids within the highly conserved REC3 domain, in positions 367 to 376 (Fig. 1A). Exploiting both of these properties, we modeled the insertion within the corresponding domain of PDB 4OO8 (18) and, when viewed in PyMOL, noticed that it formed a loop-like structure, of which several of its positive-charged residues come in close prox- imity with the target DNA near the PAM (Fig. 1B). We further iden- tified an additional insertion of two amino acids (KQ) immediately upstream of the two critical arginine residues necessary for PAM binding (19) in positions 1337 and 1338 (Fig. 1A). We thus hypothe- sized that these insertions may affect the PAM specificity of this en- zyme. To support this prediction, we computationally characterized the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome (20) to viral and plasmid genomes using BLAST (21), extracting the se- quences 3to the mapped protospacers, and subsequently generating a WebLogo (22) representation of the aligned PAM sequences. Our analysis suggested 5-NNGTT-3PAM (Fig. 1C). Intrigued by these novel motifs and motivated by the potentially reduced specificity at position 2 of the PAM sequence, we selected ScCas9 as a candidate for further PAM characterization and engineering. Determination of PAM sequences recognized by ScCas9 Because of the relatively low number of protospacer targets, we vali- dated the PAM binding sequence of ScCas9 using an existent positive selection bacterial screen based on green fluorescent protein (GFP) expression conditioned on PAM binding, termed PAM-SCANR (23). A plasmid library containing the target sequence, followed by a ran- domized 5-NNNNNNNN-3(8N) PAM sequence, was bound by a nuclease-deficient ScCas9 (and dSpCas9 as a control) and an sgRNA both specific to the target sequence and general for SpCas9 and ScCas9, allowing for the repression of lacI and expression of GFP. Plasmid DNA from fluorescence-activated cell sorting (FACS)sorted GFP- positive cells and presorted cells were extracted and amplified, and enriched PAM sequences were identified by Sanger sequencing and visualized using DNA chromatograms. Our results provide initial 1 Center for Bits and Atoms, Massachusetts Institute of Technology, Cambridge, MA, USA. 2 Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA. *These authors contributed equally to the work. Corresponding author. Email: [email protected] SCIENCE ADVANCES | RESEARCH ARTICLE Chatterjee et al., Sci. Adv. 2018; 4 : eaau0766 24 October 2018 1 of 10 on February 17, 2021 http://advances.sciencemag.org/ Downloaded from

Upload: others

Post on 05-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

MICROB IOLOGY

1Center for Bits and Atoms, Massachusetts Institute of Technology, Cambridge, MA,USA. 2Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.*These authors contributed equally to the work.†Corresponding author. Email: [email protected]

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

Copyright © 2018

The Authors, some

rights reserved;

exclusive licensee

American Association

for the Advancement

of Science. No claim to

originalU.S. Government

Works. Distributed

under a Creative

Commons Attribution

NonCommercial

License 4.0 (CC BY-NC).

D

Minimal PAM specificity of a highly similarSpCas9 orthologPranam Chatterjee1,2*†, Noah Jakimo1,2*, Joseph M. Jacobson1,2

RNA-guidedDNA endonucleases of the CRISPR-Cas system arewidely used for genome engineering and thus havenumerous applications in a wide variety of fields. CRISPR endonucleases, however, require a specific protospaceradjacent motif (PAM) flanking the target site, thus constraining their targetable sequence space. In this study, wedemonstrate the natural PAM plasticity of a highly similar, yet previously uncharacterized, Cas9 from Streptococcuscanis (ScCas9) through rational manipulation of distinguishing motif insertions. To this end, we report affinity tominimal 5′-NNG-3′ PAM sequences and demonstrate the accurate editing capabilities of the ortholog in both bac-terial and human cells. Last, we build an automated bioinformatics pipeline, the Search for PAMs by ALignment OfTargets (SPAMALOT), which further explores the microbial PAM diversity of otherwise overlooked StreptococcusCas9 orthologs. Our results establish that ScCas9 can be used both as an alternative genome editing tool and as afunctional platform to discover novel Streptococcus PAM specificities.

own

on February 17, 2021

http://advances.sciencemag.org/

loaded from

INTRODUCTIONRNA-guided endonucleases of the CRISPR-Cas system, such as Cas9(1) and Cpf1 (also known as Cas12a) (2), have proven to be versatiletools for genome editing and regulation (3), which have numerousimplications in medicine, agriculture, bioenergy, food security, nano-technology, and beyond (4). The range of targetable sequences islimited, however, by the need for a specific protospacer adjacentmotif(PAM), which is determined by DNA-protein interactions, to im-mediately follow the DNA sequence specified by the single-guideRNA (sgRNA) (1, 5–8). For example, the most widely used variant,Streptococcus pyogenes Cas9 (SpCas9), requires a 5′-NGG-3′ motifdownstream of its RNA-programmed DNA target (1, 4–8). To relaxthis constraint, additional Cas9 and Cpf1 variants with distinct PAMrequirements have been either discovered (9–13) or engineered (12–15)to diversify the range of targetable DNA sequences. In total, thesestudies have provided only a handful of CRISPR effectors with mini-mal PAM requirements that enable wide targeting capabilities.

Tohelp augment this list, we characterize an orthologousCas9 pro-tein from S. canis (ScCas9; UniProt I7QXF2), having 89.2% sequencesimilarity to SpCas9. We find that, despite this homology, ScCas9prefers a more minimal 5′-NNG-3′ PAM. To explain this divergence,we identify two notable insertions within its open reading frame(ORF) that differentiate ScCas9 from SpCas9 and contribute to itsPAM-recognition flexibility. We show that ScCas9 can efficientlyand accurately edit genomic DNA in mammalian cells and investigatepossible explanations for PAM divergence between Streptococcusorthologs. Last, we construct a bioinformatics pipeline to explorethe PAM specificities of other Streptococcus orthologs.

RESULTSIdentification of SpCas9 homologsWhile numerous Cas9 homologs have been sequenced, only a hand-ful of Streptococcus orthologs have been characterized or functionallyvalidated. To explore this space, we curated all StreptococcusCas9 pro-

tein sequences from UniProt (16), performed global pairwise align-ments using the BLOSUM62 scoring matrix (17), and calculatedpercent sequence homology to SpCas9. From them, ScCas9 stoodout, not only because of its remarkable sequence homology (89.2%)to SpCas9 but also due to a positive-charged insertion of 10 aminoacids within the highly conserved REC3 domain, in positions 367 to376 (Fig. 1A). Exploiting both of these properties, we modeled theinsertion within the corresponding domain of PDB 4OO8 (18) and,when viewed in PyMOL, noticed that it formed a “loop”-like structure,of which several of its positive-charged residues come in close prox-imity with the target DNA near the PAM (Fig. 1B). We further iden-tified an additional insertion of two amino acids (KQ) immediatelyupstream of the two critical arginine residues necessary for PAMbinding (19) in positions 1337 and 1338 (Fig. 1A). We thus hypothe-sized that these insertions may affect the PAM specificity of this en-zyme. To support this prediction, we computationally characterizedthe PAM for ScCas9 by first mapping spacer sequences from theCas9-associated type II CRISPR loci in the S. canis genome (20) toviral and plasmid genomes using BLAST (21), extracting the se-quences 3′ to the mapped protospacers, and subsequently generatinga WebLogo (22) representation of the aligned PAM sequences. Ouranalysis suggested 5′-NNGTT-3′ PAM (Fig. 1C). Intrigued by thesenovel motifs and motivated by the potentially reduced specificity atposition 2 of the PAM sequence, we selected ScCas9 as a candidatefor further PAM characterization and engineering.

Determination of PAM sequences recognized by ScCas9Because of the relatively low number of protospacer targets, we vali-dated the PAM binding sequence of ScCas9 using an existent positiveselection bacterial screen based on green fluorescent protein (GFP)expression conditioned on PAM binding, termed PAM-SCANR (23).A plasmid library containing the target sequence, followed by a ran-domized 5′-NNNNNNNN-3′ (8N) PAM sequence, was bound by anuclease-deficient ScCas9 (and dSpCas9 as a control) and an sgRNAboth specific to the target sequence and general for SpCas9 and ScCas9,allowing for the repression of lacI and expression of GFP. PlasmidDNA from fluorescence-activated cell sorting (FACS)–sorted GFP-positive cells and presorted cells were extracted and amplified, andenriched PAM sequences were identified by Sanger sequencing andvisualized using DNA chromatograms. Our results provide initial

1 of 10

Page 2: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from

evidence that ScCas9 can bind to theminimal 5′-NNG-3′PAM, distinctto that of SpCas9’s 5′-NGG-3′ (Fig. 2A).

We hypothesized that the previously described insertions maycontribute to this flexibility, and thus engineered ScCas9 to removeeither insertion or both, and subjected these variants to the samescreen. Only removing the loop (ScCas9 D367-376 or ScCas9 DLoop)extended the PAM of ScCas9 to 5′-NAG-3′, with reduced specificityfor C and G at position 2, while only removing the KQ insertion(ScCas9 D1337-1338 or ScCas9 DKQ) reverted its specificity to a more5′-NGG-3′–like PAM, with reduced specificity for A at position 2(Fig. 2A). Last, the most SpCas9-like variant, where both insertions

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

are removed (ScCas9 D367-376 D1337-1338 or ScCas9 DLoop DKQ),expectedly reverts its specificity back to 5′-NGG-3′ (Fig. 2A). Thus,from a functional perspective, these insertions operate in tandem toreduce the specificity of ScCas9 from the canonical 5′-NGG-3′ PAMto a more minimal 5′-NNG-3′.

To confirm the results of the library assay and to rule out limitingdownstream requirements, we decided to elucidate the minimal PAMrequirements of ScCas9 by using fixed PAM sequences. We replacedthe PAM library with individual PAMsequences, whichwere varied atpositions 2, 4, and 5, to test each possible base. Our results demon-strate that, while ScCas9 exhibits no clear additional base dependence

Fig. 1. In silico characterization of ScCas9. (A) Global pairwise sequence alignment of SpCas9 and ScCas9. Despite sharing 89.2% sequence homology to SpCas9,ScCas9 contains two notable insertions, one positive-charged insertion in the REC domain (367 to 376) and another KQ insertion in the PAM-interacting domain (PID;1337 and 1338), as indicated. (B) Insertion of novel REC motif into PDB 4OO8 (18). The 367 to 376 insertion demonstrates a loop-like structure (red). Several of itspositive-charged residues (yellow) come in close proximity to the target DNA near the PAM (green). (C) WebLogo (22) for sequences found at the 3′ end of protospacertargets identified in plasmid and viral genomes using type II spacer sequences within S. canis as BLAST (21) queries.

Fig. 2. PAM determination of engineered ScCas9 variants. (A) PAM binding enrichment on a 5′-NNNNNNNN-3′ (8N) PAM library. PAM profiles are represented bySanger sequencing chromatograms via amplification of PAM region following plasmid extraction of GFP+ E. coli cells. (B) Examination of PAM preference for ScCas9. Forindividual PAMs, all four bases were iterated at a single position (2, 4, 5). Each PAM-containing plasmid was electroporated in duplicates, subjected to FACS analysis, andgated for GFP expression. Subsequently, GFP expression levels were averaged. SD was used to calculate error bars, and statistical significance analysis was conductedusing a two-tailed Student’s t test as compared to the negative control.

2 of 10

Page 3: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

with activity for all base iterations at each position, ScCas9 DLoopDKQ demonstrates significant binding at 5′-NGG-3′ PAM sequencesand at some, but not all, 5′-NNGNN-3′ motifs, indicating an inter-mediate PAM specificity between that of SpCas9 and ScCas9 (Fig. 2B).

Assessment of ScCas9 PAM specificity in human cellsWe compared the PAMspecificity of ScCas9 to SpCas9 in human cellsby cotransfecting human embryonic kidney–293T (HEK293T) cellswith plasmids expressing these variants along with sgRNAs directedto a native genomic locus (VEGFA) with varying PAM sequences(table S1). We first tested editing efficiency at a site containing anoverlapping PAM (5′-GGGT-3′). After 48 hours posttransfection,gene modification rates as detected by the T7E1 assay demonstratedcomparable editing activities of SpCas9, ScCas9, and ScCas9 DLoopDKQ (Fig. 3, A and B). In addition, we constructed sgRNAs to sites

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

with various nonoverlapping 5′-NNGN-3′ PAM sequences. WhileSpCas9’s cleavage activity was impaired at other non–5′-NGG-3′ se-quences (Fig. 3) (24), ScCas9 maintained comparable activity to thatof SpCas9 on its 5′-NGG-3′ target across all tested targets with 5′-NNGN-3′ PAM sequences (Fig. 3). Consistent with our bacterialdata, ScCas9 DLoop DKQ was able to cleave at the 5′-NGG-3′ target,along with significant activity on the 5′-NNGA-3′ target, with reducedgene modification levels at all other 5′-NNGN-3′ targets (Fig. 3).Overall, these results verify that ScCas9 can serve as an effective alter-native to SpCas9 for genome editing inmammalian cells, both at over-lapping 5′-NGG-3′ and more minimal 5′-NNGN-3′ PAM sequences.

We further assessed the PAM specificity of ScCas9 base editorsusing a synthetic traffic light reporter (TLR) (25) plasmid, containingan early stop codon upstream of a GFP ORF and downstream of anmCherry ORF. Successful A→G base editing using the ABE(7.10)

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from

Fig. 3. ScCas9 PAM specificity in human cells. (A) T7E1 analysis of indels produced at VEGFA loci with indicated PAM sequences. The Cas9 used is indicated aboveeach lane. All samples were performed in biological duplicates. As a background control, SpCas9, ScCas9, and ScCas9 DLoop DKQ were transfected without targetingguide RNA vectors [(−) guide control]. (B) Quantitative analysis of T7E1 products. Unprocessed gel images were quantified by line scan analysis using Fiji (41), the totalintensity of cleaved bands were calculated as a fraction of total product, and percent gene modification was calculated. All samples were performed in duplicates, andquantified modification values were averaged. SD was used to calculate error bars, and statistical significance analysis was conducted using a two-tailed Student’s t testas compared to the negative control. (C) ScCas9-mediated A→G base editing. GFP+ cells were calculated as a percentage of mCherry+ (RFP+) cells for indicated PAMsequences using the TLR (25) with an early stop codon. All samples were performed in duplicates, and quantified percentages were averaged. SD was used to calculateerror bars, and statistical significance analysis was conducted using a two-tailed Student’s t test. RFP+, red fluorescent protein–positive.

3 of 10

Page 4: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

Dow

nload

architecture, as described in Gaudelli et al. (26), converts an early, in-frame TAG stop codon to a TGG tryptophan codon, thus restoringGFP expression. After gating cells based on mCherry expression, weobserved significant base-editing efficiency at all 5′-NNGN-3′ targetPAM sequences for ScCas9-ABE(7.10), as compared to the SpCas9-ABE(7.10) architecture, which only demonstrates significant A→Gconversion on the standard 5′-NGG-3′ and tolerated 5′-NAG-3′mo-tifs in this assay (Fig. 3C).

Off-target analysis of ScCas9Wenext sought to evaluate the accuracy of this enzyme in comparisonto SpCas9. We used previous genome-wide analysis of SpCas9 target-ing accuracy to select three genomic targets (VEGFA site 3, FANCFsite 2, and DNMT1 site 4) that have multiple off-target sites on whichSpCas9 demonstrates activity (27). Each of these three sites addition-ally has a single off-target that has beenparticularly difficult tomediatevia engineering of high-fidelity Cas9 variants (28–30). We first ana-lyzed ScCas9’s activity on these off-targets. After cotransfection ofsgRNAs to the three aforementioned sites alongside both SpCas9and ScCas9, we amplified genomic DNA flanking, both the on-target

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

and difficult off-target sequences, to assess their genomemodificationactivities. Consistent with previously reported data (30), SpCas9 de-monstrated high off-to-on targeting on all three examined targets(Fig. 4A). ScCas9 demonstrated comparable on-target activities forthe three targets, but exhibited negligible activity on the VEGFA site 3and DNMT1 site 4 off-targets, and a nearly 1.5-fold decrease in off-to-on target ratio for FANCF site 2 (Fig. 4A), suggesting improvedaccuracy over SpCas9 on overlapping 5′-NGG-3′ targets.

To examine ScCas9’s accuracy across itswider PAMtargeting range,we used a mismatch tolerance assay (30) on target sequences with5′-NAG-3′, 5′-NCG-3′, 5′-NGG-3′, and 5′-NTG-3′ PAMs.We gener-ated sgRNAs containing both single and adjacent double mismatchesat every other base across each of the four on-target sequences (table S1)and subsequently measured the genome modification efficiencies forthese mismatched sgRNAs. Our results demonstrate that ScCas9 gen-erally tolerates single mismatches better than double mismatches foreach analyzed spacer position and is similarly less likely to toleratemismatches within the seed region of the crRNA, though with greatersensitivity than SpCas9 (Fig. 4B). Across all of the four PAM targets,ScCas9 does, however, tolerate mismatches within the middle of the

on February 17, 2021

http://advances.sciencemag.org/

ed from

Fig. 4. ScCas9 performance as a genome editing tool. (A) Quantitative analysis of T7E1 products for indicated genomic on- and off-target editing. All samples wereperformed in duplicates, and quantifiedmodification values were averaged. SD was used to calculate error bars, and statistical significance analysis was conducted using atwo-tailed Student’s t test as compared to each negative control (table S2). Mismatched positions within the spacer sequence are highlighted in red. (B) Efficiency heatmapof mismatch tolerance assay. Quantified modification efficiencies, as assessed by the T7E1 assay, are exhibited for each labeled single or double mismatch in the sgRNAsequence for each indicated PAM (table S1). (C) Dot plot of on-target modification percentages at various gene targets for indicated PAM, as assessed by the T7E1 assay.Duplicate modification percentages were averaged (table S2). (D) Genomic base-editing characterization. For each indicated PAM, a representative Sanger sequencingchromatogram is shown, demonstrating themost efficiently edited base in the target sequence. Percent edited values, as quantified by BEEP in comparison to an uneditednegative control, were averaged, and SD was subsequently calculated.

4 of 10

Page 5: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from

crRNA sequence, with highest efficiencies reported for the 5′-NTG-3′target. SpCas9 expectedly demonstrates negligible genome modifica-tion activity on the 5′-NCG-3′ and 5′-NTG-3′ targets but weakly tol-erates single and doublemismatches across the entire crRNAsequence,with reduced tolerance in the seed region for the standard 5′-NGG-3′target, corroborating previous mismatch tolerance studies (30). Last,ScCas9 exhibits a mismatch tolerance profile similar to SpCas9 onthe 5′-NAG-3′ target, albeit with a higher reported on-target efficien-cy (Fig. 4B).

ScCas9 genome editing capabilitiesFinally, to establish ScCas9 as a useful genome editing tool, we evaluatedits ability to modify a variety of gene targets for a handful of differentPAM sequences. We constructed sgRNAs to 24 targets within nine en-dogenous genes in HEK293T cells and evaluated on-target gene mod-ification using the T7E1 assay. Our results demonstrate that ScCas9maintains efficiencies comparable to that of SpCas9 on 5′-NGG-3′ se-quences and on selected 5′-NNG-3′ PAM targets (Fig. 4C), supportingour previous findings (Fig. 3B). SpCas9 expectedly performs efficientlyon 5′-NGG-3′ and weakly on 5′-NAG-3′ targets but demonstratesnegligible editing capabilities on 5′-NCG-3′ and 5′-NTG-3′ PAM se-quences (Fig. 4C), as previously demonstrated. Notably, ScCas9 per-formed less effectively on selected target sequences in the hemoglobinsubunit d (HBD) gene while demonstrating higher efficiencies on 5′-NNG-3′ sequences inVEGFA andDNMT1, for example. This variationin efficiency within each PAM group and across different genes indi-cates that proper target selection within specified genomic regions iscritical for successful ScCas9-mediated gene modification (Fig. 4C).

We subsequently measured the efficacy of ScCas9 integrated with-in the BE3 (31) and ABE(7.10) (26) base-editing architectures on en-dogenous genomic loci. To evaluate the efficiency of base-editingactivities, we developed a simple, easy-to-use Python program, termedthe Base Editing Evaluation Program (BEEP), that takes as input botha negative control ab1 Sanger sequencing file and the edited sampleab1 file and outputs the efficiency of an indicated base conversion ata specific position (read from 5′ to 3′) along the target sequence. BEEPanalysis on ab1 files, following transfection of ScCas9 base editors, ge-nomic amplification, and subsequent Sanger sequencing, demonstratesthat ScCas9 is capable ofmediating C→T andA→Gbase conversion atboth overlapping 5′-NGG-3′ and nonoverlapping 5′-NNG-3′ PAM se-quences (Fig. 4D).While ScCas9 base editors perform efficiently on thenon–5′-NGG-3′ targets, as compared to SpCas9 (Figs. 3C and 4D),ScCas9 is less effective at editing 5′-NGG-3′ genomic targets thanSpCas9 for both architectures (Fig. 4D), indicating that further develop-ment is necessary for broad usage of ScCas9 base editors.

Investigation of sequence conservation between S. canisand other streptococcus Cas9 orthologsTo further investigate the distinguishing motif insertions in ScCas9,we inserted the loop (SpCas9 ::Loop), the KQ motif (SpCas9 ::KQ),or both (SpCas9 ::Loop ::KQ) into the SpCas9 ORF and analyzedbinding on the 8N library using PAM-SCANR. Of these variants, onlySpCas9 ::KQ showed target binding affinity in the PAM-SCANR assay.Sequencing on enriched GFP-expressing cells demonstrated an un-affected preference for 5′-NGG-3′ (Fig. 5A). FACS analysis on a fixed5′-TGG-3′ PAM confirmed these binding profiles, with SpCas9 ::KQyielding half the fraction of GFP-positive cells compared to SpCas9(Fig. 5B). These data, in conjunction with the binding profiles of ScCas9variants (Fig. 2), suggest that, while these insertions within ScCas9 do

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

distinguish its PAM preference from SpCas9, other sequence featuresof ScCas9 also contribute to its divergence.

S. canis has been reported to infect dogs, cats, cows, and humansand has been implicated as an adjacent evolutionary neighbor ofS. pyogenes, as evidenced by various phylogenetic analyses (20, 32, 33).In addition to sharing common hosts, we identified S. canis CRISPRspacers that map to phage lysogens in S. pyogenes genomes, which sug-gests that they are overlapping viral hosts as well. This close evolution-ary relationship has manifested itself in the sequence homology ofScCas9 and SpCas9, among other orthologous genes, predicted to bea result of lateral gene transfer (LGT) (20, 32, 33). Nonetheless, fromthe alignment of SpCas9 and ScCas9, the first 1240 positions score with93.5% similarity and the last 144 positions scorewith 52.8%. To accountfor the exceptional divergence in the PID at the C terminus of ScCas9and the positive-charged inserted loop, we focused on alignment of thedistinguishing sequences of ScCas9 to other Streptococcus Cas9 ortho-logs (Fig. 5C). Notably, the loop motif is present in certain orthologs,such as those from S. gordonii, S. anginosus, and S. intermedius, whilethe ScCas9 PID is mostly composed of disjoint sequences from otherorthologs, such as those from S. phocae, S. varani, and S. equinis. Addi-tional LGT events between these orthologs, as opposed to isolateddivergence, more likely explain the differences between ScCas9 andSpCas9. Our demonstration that two insertion motifs in ScCas9 alterPAMpreferences, yet do not abolish PAMbinding when removed, sug-gests other functional evolutionary intermediates in the formation ofeffective PAM preferences.

Genus-wide prediction of divergent StreptococcusCas9 PAMsDemonstrations of efficient genome editing by Cas9 nucleases withdistinct PAM specificity from several Streptococcus species, includingS. canis, motivated us to develop a bioinformatics pipeline for dis-covering additional Cas9 proteins with novel PAM requirements inthe Streptococcus genus. We call this method the Search for PAMsby ALignment Of Targets (SPAMALOT). Briefly, we mapped a 20-ntportion of spacers flanked by known Streptococcus repeat sequences tocandidate protospacers that align with no more than two mismatchesin phages associatedwith the genus (34).We grouped 12-nt protospacer3′-adjacent sequences from each alignment by genome and CRISPRrepeat and then generated groupWebLogos (22) to compute presumedPAM features. Figure 6A shows that resultingWebLogos accurately re-flect the known PAM specificities of ScCas9 (this work), S. pyogenes,S. thermophilus, and S. mutans (7, 35, 36). We identified a notable di-versity in theWebLogo plots derived fromvarious S. thermophilus cas-settes with common repeat sequences (Fig. 6B), each of which couldoriginate from any other such S. thermophilusWebLogo upon subtlespecificity changes that traverse intermediateWebLogos among them.Weobserve a similar relationship between two S. oralisWebLogos thatalso share this repeat and unique putative PAM specificities associatedwith CRISPR cassettes containing S. mutans–like repeats from theS. oralis, S. equinis, and S. pseudopneumoniae genomes (Fig. 6C).

DISCUSSIONAs the growth and development of CRISPR technologies continue, therange of targetable sequences remains limited by the requirement for aPAM sequence flanking a given target site. While significant discoveryand engineering efforts have been undertaken to expand this range(9–15), there are still only a handful of CRISPR endonucleases with

5 of 10

Page 6: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from

minimal specificity requirements. Here, we have developed an analo-gous platform for genome editing using ScCas9, a highly similar SpCas9ortholog with affinity to minimal 5′-NNG-3′ PAM sequences.

Established PAM engineering methods, such as randommutagene-sis and directed evolution, can only generate substitution mutations inprotein coding sequences. During the preparation of this manuscript,another group used phage-assisted continuous evolution (37) to evolvean SpCas9 variant, xCas9(3.7), with preference for various 5′-NG-3′PAM sequences (38). An alternative approach consists of inserting orremovingmotifswith specific properties, whichmayprovide a sequencesearch space that more common mutagenic techniques cannot directlyaccess. Here, we demonstrate an evolutionary example of this methodwith ScCas9, whose sequence disparities with SpCas9 include two diver-gent motifs that contribute to its minimal PAM sequence. Engineeredvariants lacking thesemotifs exhibitmore stringent PAM specificities inour PAM determination assays, and the removal of both motifs revertsits PAM specificity back to a more 5′-NGG-3′–like preference. While

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

minimal inconsistencies in PAM preference between the used assaysmay arise from PAM-dependent allosteric changes that drive DNAcleavage (19), the PAM flexibility of ScCas9, as compared to SpCas9,remains consistent in all tested contexts.

To date, there are limited open-source tools or platforms specifi-cally for the prediction of PAM sequences, though prior studies haveconducted internal bioinformatics-based characterizations before ex-perimental validation (9–11, 13, 30, 39). Here, we have establishedSPAMALOT as an accessible resource that we share with the commu-nity for application to CRISPR cassettes from other genera. Future de-velopment will include broadening the scope of candidate targetsbeyond genus-associated phage to capture additional sequences thatcould be beneficial targets, such as lysogens in species that host thesame phage.We hope that this pipeline can be used tomore efficientlyvalidate and engineer PAM specificities that expand the targetingrange ofCRISPR, especially for strictly PAM-constrained technologiessuch as base editing (26, 31) and homology repair induction (40).

Fig. 5. Relationship of ScCas9 to other Streptococcus orthologs. (A) PAM binding enrichment on a 5′-NNNNNNNN-3′ PAM library of ScCas9-like SpCas9 variants. ThePAM-SCANR screen (23) was applied to variants of SpCas9 containing the loop, KQ insertions, or both. SpCas9 ::Loop and SpCas9 ::Loop ::KQ failed to demonstrate PAMbinding and thus GFP expression. (B) FACS analysis of binding at 5′-NGG-3′ PAM. All samples were performed in duplicates and averaged. SD was used to calculate errorbars. (C) Sequence conservation of Streptococcus orthologs with ScCas9 as a reference. Each ortholog is referred to by its UniProt ID (16). The loop (367 to 376) and KQ(1337 and 1338) insertion alignments are indicated.

6 of 10

Page 7: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from

Last, because ScCas9 does not require any alterations to the sgRNAof SpCas9, and because of its significant sequence homology withSpCas9, we presume that identicalmodifications fromprevious studies(28–30) can be made to increase the accuracy and efficiency of the en-donuclease and its variants, although it already demonstrates potentiallyimproved on-to-off activity as compared to the standard SpCas9 on5′-NGG-3′ targets. In addition, while we have exhaustively evaluatedthe PAMspecificity of ScCas9 onmultiple targets in a variety of genomeediting contexts, we do not rule out the possibility that there may existuntested 5′-NNG-3′ genomic targets on which ScCas9 does not havesignificant activity. When used together with SpCas9 and xCas9(3.7),however, ScCas9 expands the target range of currently used Cas9 en-zymes for genome editing purposes. With further development, we an-ticipate that this broadened Streptococcus Cas9 toolkit, containing bothScCas9 and additional, uncharacterized orthologs with expanded tar-geting range, will enhance the current set of CRISPR technologies.

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

MATERIALS AND METHODSIdentification of Cas9 homologs and generation of plasmidsThe UniProt database (16) was mined for all Streptococcus Cas9 pro-tein sequences, which were used as inputs to either the BioPythonpairwise2 module or Geneious to conduct global pairwise alignmentswith SpCas9, using the BLOSUM62 scoring matrix (17) and subse-quently calculate percent homology. The ScCas9was codon optimizedfor Escherichia coli, ordered as multiple gBlocks from IntegratedDNATechnologies and assembled using Golden Gate Assembly. The pSF-EF1-Alpha-Cas9WT-EMCV-Puro (OG3569) plasmid for human ex-pression of SpCas9 was purchased from Oxford Genetics, and theORFs of Cas9 variants were individually amplified by polymerasechain reaction (PCR) to generate 35–base pair (bp) extensions for sub-sequent Gibson Assembly into the OG3569 backbone. Engineering ofthe coding sequence of ScCas9 and SpCas9 for removal or insertion ofmotifs was conducted using either the Q5 Site-Directed Mutagenesis

Fig. 6. SPAMALOT PAM predictions for Streptococcus Cas9 orthologs. Spacer sequences found within the type II CRISPR cassettes associated with Cas9 ORFs fromspecified Streptococcus genomes were aligned to S. phage genomes to generate spacer-protospacer mappings. WebLogos (22), labeled with the relevant species,genome, and CRISPR repeat, were generated for sequences found at the 3′ end of candidate protospacer targets with no more than two mismatches (2 mm). (A) PAMpredictions for experimentally validated Cas9 PAM sequences in previous studies. (B) Novel PAM predictions of alternate S. thermophilus Cas9 orthologs with putativedivergent specificities. (C) Novel PAM predictions of uncharacterized Streptococcus orthologs with distinct specificities.

7 of 10

Page 8: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from

Kit (New England Biolabs) or Gibson Assembly. To generate ScCas9base-editing plasmids, pCMV-ABE(7.10) (Addgene plasmid no.102919) and pCMV-BE3 (Addgene plasmid no. 73021) were receivedas gifts from D. Liu. Similarly, the ORF of the ScCas9 D10A nickasewas amplified by PCR to generate 35-bp extensions for subsequentGibson Assembly into each base-editing architecture backbone. sgRNAplasmids were constructed by annealing oligonucleotides coding forcrRNA sequences (table S1) and 4-bp overhangs and subsequentlyperforming a T4 DNA ligase–mediated ligation reaction into a plasmidbackbone immediately downstream of the human U6 promotersequence. Assembled constructs were transformed into 50 ml of NEBTurbo Competent E. coli cells and plated onto Luria broth (LB) agarsupplementedwith the appropriate antibiotic for subsequent sequenceverification of colonies and plasmid purification.

PAM-SCANR assayPlasmids for the SpCas9 sgRNA and PAM-SCANR genetic circuit, aswell as BW25113 DlacI cells, were provided by the Beisel Lab (NorthCarolina State University). Plasmid libraries containing the target se-quence followed by either a fully randomized 8-bp 5′-NNNNNNNN-3′library or fixed PAM sequences were constructed by conducting site-directed mutagenesis on the PAM-SCANR plasmid flanking the proto-spacer sequence. Nuclease-deficient mutations (D10A and H850A)were introduced to the ScCas9 variants using Gibson Assembly. Theprovided BW25113 cells were made electrocompetent using stan-dard glycerol wash and resuspension protocols. The PAM library andsgRNA plasmids, with resistance to kanamycin (Kan) and carbenicillin(Crb), respectively, were coelectroporated into the electrocompetentcells at 2.4 kV, outgrown, and recovered in Kan + Crb LB mediumovernight. The outgrowth was diluted 1:100, grown to ABS600 of0.6 in Kan + Crb LB liquid medium, and made electrocompetent. In-dicatedCas9 plasmids, with resistance to chloramphenicol (Chl), wereelectroporated in duplicates into the electrocompetent cells harboringboth the dCas9 and sgRNA plasmids, outgrown, and collected in 5 mlof Kan + Crb + Chl LBmedium. Overnight cultures were diluted to anABS600 of 0.01 and cultured to an OD600 (optical density at 600 nm)of 0.2. Cultures were analyzed and sorted on a FACSAria machine(BectonDickinson). Events were gated on the basis of forward scatterand side scatter, and fluorescence was measured in the fluoresceinisothiocyanate (FITC) channel (488-nm laser excitation, 530/30 fil-ter for detection), with at least 30,000 gated events for data analysis.Sorted GFP-positive cells were grown to sufficient density, and plas-mids from the presorted and sorted populations were then isolated,and the region flanking the nucleotide library was PCR-amplifiedand submitted for Sanger sequencing (Genewiz). Bacteria harboringnonlibrary PAM plasmids, performed in duplicates, were analyzedby FACS following electroporation and overnight incubation andrepresented as the percentage of GFP-positive cells in the populationusing SD to calculate error bars. The PAM-SCANR spacer sequenceis 5′-CGAAAGGTTTTGCACTCGAC-3′. Additional details on thePAM-SCANR assay can be found in Leenay et al. (23).

Cell culture and gene modification analysisHEK293T cells were maintained in Dulbecco’s modified Eagle’s me-dium supplemented with 100U/ml penicillin, 100U/ml streptomycin,and 10% fetal bovine serum. sgRNA plasmid (500 ng) and effector[nuclease, BE3, or ABE(7.10)] plasmid (500 ng) were transfected intocells as duplicates (2× 105 perwell in a 24-well plate)with Lipofectamine2000 (Invitrogen) inOpti-MEM(Gibco).After 48hours posttransfection,

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

genomicDNAwas extracted using QuickExtract Solution (Epicentre),and genomic loci were amplified by PCRusing theKAPAHiFiHotStartReadyMix (Kapa Biosystems). For base-editing analysis, ampliconswere purified and submitted for Sanger sequencing (Genewiz). For indelanalysis, the T7E1 reaction was conducted according to the manufac-turer’s instructions, and equal volumes of products were analyzed on a2% agarose gel stained with SYBR Safe (Thermo Fisher Scientific). Un-processed gel image files were analyzed in Fiji (41). The cleaved bands ofinterest were isolated using the rectangle tool, and the areas under thecorresponding peaks were measured and calculated as the fractioncleaved of the total product. Percent gene modification was calculatedas follows (42)

%gene modification ¼ 100� 1� ð1� fraction cleavedÞ12� �

All samples were performed in duplicates, and percent gene modi-fications were averaged. SD was used to calculate error bars.

Base-editing analysis with TLRHEK293T cells were maintained, as previously described, and trans-fected with the corresponding sgRNA plasmids (333 ng), ABE7.10plasmids (333 ng), and synthetically constructed TLR plasmids(333 ng) into cells as duplicates (2 × 105 per well in a 24-well plate)with Lipofectamine 2000 (Invitrogen) in Opti-MEM (Gibco). After5 days posttransfection, cells were harvested and analyzed on aFACSCelesta machine (Becton Dickinson) for mCherry (561-nmlaser excitation, 610/20 filter for detection) and GFP (488-nm laserexcitation, 530/30 filter for detection) fluorescence. Cells expressingmCherry were gated, and percent GFP+ calculation of the subsetwere calculated. All samples were performed in duplicates, and per-centage values were averaged. SD was used to calculate error bars.The TLR spacer sequence is 5′-TTCTGTAGTCGACGGTACCG-3′.

Base-editing evaluation program (BEEP)The BEEP software was written in Python, using the pandas data ma-nipulation library and BioPython package. As inputs, the program re-quires a sample ab1 file, a negative control ab1 file, a target sequence,and the position of the specified base conversion, handled as a.csv fileeither for multiple sample analysis or for individual samples on thecommand line. Briefly, the provided target sequences are aligned tothe base calls of each input ab1 file to determine the absolute positionof the target within the file. Subsequently, the peak values for each baseat the indicated position in the spacer are obtained, and the editingpercentage of the specified base conversion is calculated. Last, aseparate function normalizes the editing percentage to that of the neg-ative control ab1 file to account for background signals of each base.The final base conversion percentage is outputted to the same .csv filefor downstream analysis. The BEEP software can be downloaded athttps://github.com/mitmedialab/BEEP.

SPAMALOT pipelineAll 11,440 Streptococcus bacterial and 53 Streptococcus-associatedphage genomes were downloaded from the National Center for Bio-technology Information (NCBI). CRISPR repeats cataloged for the genuswere downloaded fromCRISPRdb hosted by the University of Paris-Sud(43). For each genome, spacers upstream of a specific repeat sequencewere collected with a toolchain consisting of the fast and memory-efficient Bowtie 2 alignment (44). Each genome and repeat-type–specific

8 of 10

Page 9: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

collection of spacers were then matched to all phage genomes using theoriginal Bowtie short-sequence alignment tool (45) to identify candidateprotospacers with at most one, two, or no mismatches. Unique candi-dates were input into the WebLogo 3 (22) command line tool for pre-diction of PAM features. The SPAMALOT software can be downloadedat https://github.com/mitmedialab/SPAMALOT.

Statistical analysisData are shownasmeans ± SDunless stated otherwise. Statistical anal-ysis was performed using the two-tailed Student’s t test, using theSciPy software package. Calculated P values, as compared to the neg-ative control, are represented as follows: *P≤ 0.05, **P≤ 0.01, ***P≤0.001, and ****P ≤ 0.0001. Data were plotted using Matplotlib.

Dow

nlo

SUPPLEMENTARY MATERIALSSupplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/10/eaau0766/DC1Table S1. Protein and nucleotide sequences used in this study, grouped by corresponding figure.Table S2. Extended data containing raw and calculated values acquired in this study, groupedby corresponding figure.

on February 17, 2021

http://advances.sciencemag.org/

aded from

REFERENCES AND NOTES1. M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J. A. Doudna, E. Charpentier, A programmable

dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 337,816–821 (2012).

2. B. Zetsche, J. S. Gootenberg, O. O. Abudayyeh, I. M. Slaymaker, K. S. Makarova,P. Essletzbichler, S. E. Volz, J. Joung, J. van der Oost, A. Regev, E. V. Koonin, F. Zhang, Cpf1is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell 163, 759–771(2015).

3. L. S. Qi, M. H. Larson, L. A. Gilbert, J. A. Doudna, J. S. Weissman, A. P. Arkin, W. A. Lim,Repurposing CRISPR as an RNA-guided platform for sequence-specific control of geneexpression. Cell 152, 1173–1183 (2013).

4. R. Barrangou, P. Horvath, A decade of discovery: CRISPR functions and applications.Nat. Microbiol. 2, 17092 (2017).

5. F. J. Mojica, C. Díez-Villaseñor, J. García-Martínez, C. Almendros, Short motif sequencesdetermine the targets of the prokaryotic CRISPR defense system. Microbiology 155,733–740 (2009).

6. S. A. Shah, S. Erdmann, F. J. Mojica, R. A. Garrett, Protospacer recognition motifs: Mixedidentities and functional diversity. RNA Biol. 10, 891–899 (2013).

7. S. H. Sternberg, S. Redding, M. Jinek, E. C. Greene, J. A. Doudna, DNA interrogation by theCRISPR RNA-guided endonuclease Cas9. Nature 507, 62–67 (2014).

8. F. Jiang, K. Zhou, L. Ma, S. Gressel, J. A. Doudna, A Cas9-guide RNA complex preorganizedfor target DNA recognition. Science 384, 1477–1481 (2015).

9. F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem,X. Wu, K. S. Makarova, E. V. Koonin, P. A. Sharp, F. Zhang, In vivo genome editingusing Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

10. K. M. Esvelt, P. Mali, J. L. Braff, M. Moosburner, S. J. Yaung,G. M. Church, Orthogonal Cas9proteins for RNA-guided gene regulation and editing. Nat. Methods 10, 1116–1121(2013).

11. E. Kim, T. Koo, S. W. Park, D. Kim, K. Kim, H. Y. Cho, D. W. Song, K. J. Lee, M. H. Jung, S. Kim,J. H. Kim, J. H. Kim, J. S. Kim, In vivo genome editing with a small Cas9 orthologue derivedfrom Campylobacter jejuni. Nat. Commun. 8, 14500 (2017).

12. H. Hirano, J. S. Gootenberg, T. Horii, O. O. Abudayyeh, M. Kimura, P. D. Hsu, T. Nakane,R. Ishitani, I. Hatada, F. Zhang, H. Nishimasu, O. Nureki, Structure and engineering ofFrancisella novicida Cas9. Cell 164, 950–961 (2016).

13. L. B. Harrington, D. Paez-Espino, B. T. Staahl, J. S. Chen, E. Ma, N. C. Kyrpides, J. A. Doudna,A thermostable Cas9 with increased lifetime in human plasma. Nat. Commun. 8, 1424(2017).

14. B. P. Kleinstiver, M. S. Prew, S. Q. Tsai, V. V. Topkar, N. T. Nguyen, Z. Zheng, A. P. Gonzales,Z. Li, R. T. Peterson, J. R. Yeh, M. J. Aryee, J. K. Joung, Engineered CRISPR-Cas9 nucleaseswith altered PAM specificities. Nature 523, 481–485 (2015).

15. L. Gao, D. B. T. Cox, W. X. Yan, J. C. Manteiga, M. W. Schneider, D. B. T. Cox, W. X. Yan,J. C. Manteiga, M. W. Schneider, T. Yamano, H. Nishimasu, O. Nureki, N. Crosetto, F. Zhang,Engineered Cpf1 variants with altered PAM specificities. Nat. Biotechnol. 35, 789–792(2017).

16. The UniProt Consortium, UniProt: The universal protein knowledgebase. Nucleic Acids Res.45, D158–D169 (2017).

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

17. S. Henikoff, J. G. Henikoff, Amino acid substitution matrices from protein blocks.Proc. Natl. Acad. Sci. U.S.A. 89, 10915–10919 (1992).

18. H. Nishimasu, F. A. Ran, P. D. Hsu, S. Konermann, S. I. Shehata, N. Dohmae, R. Ishitani,F. Zhang, O. Nureki, Crystal structure of Cas9 in complex with guide RNA and target DNA.Cell 156, 935–949 (2014).

19. C. Anders, K. Bargsten, M. Jinek, Structural plasticity of PAM recognition by engineeredvariants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).

20. T. Lefébure, V. P. Richards, P. Lang, P. Pavinski-Bitar, M. J. Stanhope, Gene repertoireevolution of Streptococcus pyogenes inferred from phylogenomic analysis withStreptococcus canis and Streptococcus dysgalactiae. PLOS ONE 7, e37607 (2012).

21. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, Basic local alignment searchtool. Jour. of Mol. Biol. 215, 403–410 (1990).

22. G. E. Crooks, G. Hon, J. M. Chandonia, S. E. Brenner, WebLogo: A sequence logo generator.Genome Res. 14, 1188–1190 (2004).

23. R. T. Leenay, K. R. Maksimchuk, R. A. Slotkowski, R. N. Agrawal, A. A. Gomaa, A. E. Briner,R. Barrangou, C. L. Beisel, Identifying and visualizing functional PAM diversity acrossCRISPR-Cas systems. Mol. Cell 62, 137–147 (2016).

24. P. D. Hsu, D. A. Scott, J. A. Weinstein, F. A. Ran, S. Konermann, V. Agarwala, Y. Li, E. J. Fine,X. Wu, O. Shalem, T. J. Cradick, L. A. Marraffini, G. Bao, F. Zhang, DNA targeting specificityof RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

25. M. T. Certo, B. Y. Ryu, J. E. Annis, M. Garibov, J. Jarjour, D. J. Rawlings, A. M. Scharenberg,Tracking genome engineering outcome at individual DNA breakpoints. Nat. Methods8, 671–676 (2011).

26. N. M. Gaudelli, A. C. Komor, H. A. Rees, M. S. Packer, A. H. Badran,D. R. Liu, Programmablebase editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471(2017).

27. S. Q. Tsai, Z. Zheng, N. T. Nguyen, M. Liebers, V. V. Topkar, N. Wyvekens, C. Khayter,A. J. Iafrate, L. P. Le, M. J. Aryee, J. K. Joung, GUIDE-seq enables genome-wide profiling ofoff-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).

28. I. M. Slaymaker, L. Gao, B. Zetsche, D. A. Scott, W. X. Yan, F. Zhang, Rationally engineeredCas9 Nucleases with improved specificity. Science 351, 84–88 (2016).

29. B. P. Kleinstiver, V. Pattanayak, M. S. Prew, S. Q. Tsai, N. T. Nguyen, Z. Zheng, J. K. Joung,High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-targeteffects. Nature 529, 490–495 (2016).

30. J. S. Chen, Y. S. Dagdas, B. P. Kleinstiver, M. M. Welch, A. A. Sousa, L. B. Harrington,S. H. Sternberg, J. K. Joung, A. Yildiz, J. A. Doudna, Enhanced proofreading governsCRISPR-Cas9 targeting accuracy. Nature 550, 407–410 (2017).

31. A. C. Komor, Y. B. Kim, M. S. Packer, J. A. Zuris, D. R. Liu, Programmable editing of a targetbase in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424(2016).

32. V. P. Richards, R. N. Zadoks, P. D. Pavinski Bitar, T. Lefébure, P. Lang, B. Werner, L. Tikofsky,P. Moroni, M. J. Stanhope, Genome characterization and population genetic structureof the zoonotic pathogen, Streptococcus canis. BMC Microbiol. 12, 293 (2012).

33. V. P. Richards, S. R. Palmer, P. D. Pavinski Bitar, X. Qin, G. M. Weinstock, G. M. Weinstock,S. K. Highlander, C. D. Town, R. A. Burne, M. J. Stanhope, Phylogenomics and thedynamic genome evolution of the genus Streptococcus. Genome Biol. Evol. 6, 741–753(2014).

34. S. A. Shmakov, V. Sitnik, K. S. Makarova, Y. I. Wolf, K. V. Severinov, E. V. Koonin, The CRISPRspacer space is dominated by sequences from species-specific mobilomes. MBio 8,e01397–17 (2017).

35. M. Müller, C. M. Lee, G. Gasiunas, T. H. Davis, T. J. Cradick, V. Siksnys, G. Bao, T. Cathomen,C. Mussolino, Streptococcus thermophilus CRISPR-Cas9 systems enable specific editing ofthe human genome. Mol. Ther. 24, 636–644 (2016).

36. I. Fonfara, A. L. Rhun, K. Chylinski, K. S. Makarova, A. L. Lécrivain, J. Bzdrenga, E. V. Koonin,E. Charpentier, Phylogeny of Cas9 determines functional exchangeability of dual-RNAand Cas9 among orthologous type II CRISPR-Cas systems. Nucleic Acids Res. 42,2577–2590 (2014).

37. K. M. Esvelt, J. C. Carlson, D. R. Liu, A system for the continuous directed evolution ofbiomolecules. Nature 472, 499–503 (2011).

38. J. H. Hu, S. M. Miller, M. H. Geurts, W. Tang, L. Chen, N. Sun, C. M. Zeina, X. Gao, H. A. Rees,Z. Lin, D. R. Liu, Evolved Cas9 variants with broad PAM compatibility and high DNAspecificity. Nature 556, 57–63 (2018).

39. C. Hidalgo-Cantabrana, A. B. Crawley, B. Sanchez, R. Barrangou, Characterizationand exploitation of CRISPR Loci in Bifidobacterium longum. Front. Microbiol. 8, 1851(2017).

40. C. D. Richardson, G. J. Ray, M. A. DeWitt, G. L. Curie, J. E. Corn, Enhancing homology-directedgenome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donorDNA. Nat. Biotechnol. 34, 339–344 (2016).

41. J. Schindelin, I. Arganda-Carreras, E. Frise, V. Kaynig, M. Longair, T. Pietzsch, S. Preibisch,C. Rueden, S. Saalfeld, B. Schmid, J. Y. Tinevez, D. J. White, V. Hartenstein, K. Eliceiri,P. Tomancak, A. Cardona, Fiji: An open-source platform for biological-image analysis.Nat. Methods 9, 676–682 (2012).

9 of 10

Page 10: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

SC I ENCE ADVANCES | R E S EARCH ART I C L E

42. D. Y. Guschin, A. J. Waite, G. E. Katibah, J. C. Miller, M. C. Holmes, E. J. Rebar, A rapid andgeneral assay for monitoring endogenous gene modification. Methods Mol. Biol. 649,247–256 (2010).

43. I. Grissa, G. Vergnaud, C. Pourcel, The CRISPRdb database and tools to display CRISPRsand to generate dictionaries of spacers and repeats. BMC Bioinform. 8, 172 (2007).

44. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods 9,357–359 (2012).

45. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Ultrafast and memory-efficient alignmentof short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

Acknowledgments: We are very grateful to R. Leenay and C. Beisel for sharing PAM-SCANRplasmids and cells and for providing guidance in implementing the PAM-SCANR protocol.We thank L. Nip for technical assistance and T. Karydis for critical assistance in modeling PDBdomain insertions. We also thank E. Boyden for access to cell culture, in addition toN. Gershenfeld and S. Zhang for shared laboratory equipment. Author contributions: P.C.and N.J. conceived study and designed experiments. P.C. and N.J. carried out experiments. P.C.

Chatterjee et al., Sci. Adv. 2018;4 : eaau0766 24 October 2018

and N.J. conducted data analyses. P.C. and N.J. designed and implemented bioinformaticsworkflows. P.C. and N.J. wrote the paper. P.C., N.J., and J.M.J. reviewed the paper. J.M.J.supervised the project. Competing interests: P.C., N.J., and J.M.J. are inventors on aprovisional patent related to this work (no. 62/560,630; filed 19 September 2017). All authorsdeclare that they have no additional competing interests. Data and materials availability:All data needed to evaluate the conclusions in the paper are present in the paper and/orthe Supplementary Materials. BEEP and SPAMALOT code, as well as WebLogos, are hosted onGitHub. Additional data related to this paper may be requested from the authors.

Submitted 20 May 2018Accepted 17 September 2018Published 24 October 201810.1126/sciadv.aau0766

Citation: P. Chatterjee, N. Jakimo, J. M. Jacobson, Minimal PAM specificity of a highly similarSpCas9 ortholog. Sci. Adv. 4, eaau0766 (2018).

10 of 10

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from

Page 11: Minimal PAM specificity of a highly similar SpCas9 ortholog · the PAM for ScCas9 by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the S. canis genome

Minimal PAM specificity of a highly similar SpCas9 orthologPranam Chatterjee, Noah Jakimo and Joseph M. Jacobson

DOI: 10.1126/sciadv.aau0766 (10), eaau0766.4Sci Adv 

ARTICLE TOOLS http://advances.sciencemag.org/content/4/10/eaau0766

MATERIALSSUPPLEMENTARY http://advances.sciencemag.org/content/suppl/2018/10/22/4.10.eaau0766.DC1

REFERENCES

http://advances.sciencemag.org/content/4/10/eaau0766#BIBLThis article cites 45 articles, 4 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

is a registered trademark of AAAS.Science AdvancesYork Avenue NW, Washington, DC 20005. The title (ISSN 2375-2548) is published by the American Association for the Advancement of Science, 1200 NewScience Advances

License 4.0 (CC BY-NC).Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of

on February 17, 2021

http://advances.sciencemag.org/

Dow

nloaded from