Orthologue identification, phylogenetic analysis and sequence alignment
We used previously identified Cas12a2 protein sequences27 as queries for tBLASTn and BLASTp searches in the NCBI databases (https://www.ncbi.nlm.nih.gov) and the JGI Integrated Microbial Genomes and Microbiomes database (https://img.jgi.doe.gov) to identify closely related orthologues. The resulting amino acid sequences, along with Cas12a orthologues used as an outgroup, were aligned using Clustal Omega59. The trimmed alignment generated using ClipKIT60 was then used to reconstruct a phylogeny using IQ-TREE (v.2.3.6) (-m MFP -T 8 -B 1000)60,61 with a maximum-likelihood approach. Assignment of putative domains and conserved amino acid residues, as shown in Fig. 1a,b and Extended Data Fig. 1, was performed with reference to SuCas12a2 (refs. 27,28). The amino acid sequences of the nucleases are provided in Supplementary Data 1. Information on each nuclease, including contig accession numbers, source organisms and the presence of spacer acquisition genes (cas1, cas2 and cas4), is provided in Supplementary Table 1. The presence of these genes was determined using DefenseFinder (v.2.0.1) with defense-finder-models (v.2.0.2)62. CRISPR arrays were identified using CRISPRCasFinder (v.4.2.21)63.
Strains, plasmids and oligonucleotides
Strains, plasmids and oligonucleotides used in this study are listed in Supplementary Table 2. Nuclease-encoding sequences were codon-optimized for expression in E. coli and synthesized by Twist Bioscience, unless stated otherwise. DNA oligonucleotides and FAM-labelled reporters were synthesized by Integrated DNA Technologies.
PFS screen in E. coli
A PFS-containing plasmid library (CBS-6873) was constructed by incorporating a target sequence (CAO1: 5′-CAUCAAGCCUUCCUUCAGGUGUUGCUCCA-3′) followed by 1,024 combinations of five randomized nucleotides (NNNNN). Thus, the target, placed under the PJ23119 promoter (https://parts.igem.org/Promoters/Catalog/Anderson) was cloned into a low-copy sc101 plasmid (around 5 copies per cell), which was then amplified using the primers ODpr23 and ODpr24 (Supplementary Table 2), with the forward primer including a 5-nucleotide randomized overhang. After DpnI treatment to remove template DNA, the resulting PCR product was ligated and electroporated into E. coli TOP10, which produced >2 million transformants (about 2,000-fold library coverage). The PFS preferences of Cas12a, Cas12a2, Cas12a3 and Cas12a4 orthologues was assessed by targeting the CBS-6873 library with a CAO1-targeting crRNA plasmid (CBS-6875), using a non-targeting crRNA plasmid (CBS-6876) as a control. The nucleotide-encoding sequences were codon-optimized for E. coli and expressed under a T7 promoter, whereas crRNAs were driven by the PJ23119 promoter. E. coli BL21(AI) cells with the nuclease and crRNA plasmids were electroporated in three separate reactions, each using around 500 ng of library DNA in 50 µl competent cells recovered in LB with 0.1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) and 0.2% l-arabinose and grown overnight to produce about 2 million transformants (>2,000-fold library coverage). Plasmids were then purified using a ZymoPURE II Plasmid Midiprep kit (D4201). Reactions for each experimental condition were carried out in duplicate.
Purified plasmids from both target and non-target conditions were first PCR-amplified using the primers ODpr55 and ODpr56 (Supplementary Table 2) with KAPA HIFI HotStart polymerase (KK2601) for 20 cycles at 64.5 °C following the manufacturer’s protocol. After amplification, these PCR products were purified using AMPureXP beads (Beckman Coulter, A63880) and subsequently indexed for Illumina sequencing using standard indexing primers with KAPA HIFI HotStart polymerase (KK2601) for 8 cycles at 61.5 °C with 2 µM forward and reverse primers and 5 ng µl–1 DNA. The resulting indexed PCR products were sequenced on an Illumina NovaSeq 6000 (paired-end, 150 bp reads), which ensured that at least 2 million reads per sample were sequenced. Raw FASTQ files were processed with Trimmomatic (v.0.39)64 using the following parameters: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10, LEADING:3, TRAILING:3, and SLIDINGWINDOW:4:15. Paired-end reads were merged using BBMerge (qtrim=t, trimq=10, minlength=20)64,65. Sequences containing motifs matching “TTCCTTCAGGTGTTGCTCCA (…..) GGTGAGTTCT”, corresponding to the 20-nucleotide target-encoding sequence and the 10-nucleotide downstream sequence, were extracted, excluding any sequences with ambiguous bases (N) or Phred scores below 20. Depletion scores were then calculated using the formula: depletion = (sum(non-target)/sum(target)) × (count(target)/count(non-target)). The log 2 fold change values for these scores were computed for the nucleotides at PFS positions (+1 to +5), and scatterplots visualizing the PFS preferences were generated using Matplotlib in Python.
Plasmid clearance in E. coli
To test plasmid clearance, we expressed E. coli codon-optimized orthologues (as in the PFS screen) from plasmids (Supplementary Table 2). Target RNA and crRNA were co-expressed from a single plasmid under separate PJ23119 promoters (plasmids CBS-6177–6182 in Supplementary Table 2 include CAO1, GAPDH and GFP targets with both target and non-target crRNAs). E. coli BL21 (AI) cells with these target–guide constructs were electroporated with 1 µl of high-purity 50 ng µl–1 nuclease plasmid. Cells were then recovered in LB with 0.2% l-arabinose and 1 mM IPTG at 37 °C for 1 h with shaking without antibiotics. Serial tenfold dilutions (up to 10–1 in 1× PBS) were then prepared, and 5 µl of each dilution was spotted onto LB agar plates containing 0.2% l-arabinose, 0.1 mM IPTG and the appropriate antibiotics, including 50 µg ml–1 kanamycin for selection of the nuclease plasmid alone (assessing growth arrest) or 50 µg ml–1 kanamycin plus 25 µg ml–1 chloramphenicol for co-selection (assessing plasmid clearance). Plates were incubated overnight at 37 °C. Experiments were performed in four biological replicates.
relA deletion in E. coli
... continue reading