Dual activities of an X-family DNA polymerase regulate CRISPR-induced insertional mutagenesis across species – Nature.com

Posted: July 27, 2024 at 8:04 pm

Low predictive power of CRISPR-Cas9 mutagenesis prediction programs for plants

To assess the predictability of the CRISPR-Cas9-induced mutations in plants, we examined the performance of two widely used CRISPR mutagenesis prediction programs, FORECasT and InDephi11,12. We generated CRISPR-induced mutations at 59 sites, including 26 from Arabidopsis and 33 from Setaria, by introducing the corresponding CRISPR-Cas9 constructs into each species (Supplementary Data1). In Arabidopsis, each CRISPR-Cas9 construct was introduced using the floral dip-based stable transgenic approach. Individual seedlings of each T1 transgenic plant were collected for the CRISPR mutation analysis at each target site. In Setaria, Individual CRISPR-Cas9 constructs were transformed via transient protoplasts transfection. Transformed protoplast cells were collected after 48h for the mutation assay. Subsequently, the mutation profile of each site was obtained by using the next-generation sequencing (NGS) based assay. The indel mutagenesis rates averaged at 8.9% and 28.4% at the sites from Arabidopsis and Setaria, respectively (Supplementary Fig.1a). Additionally, the indel profile from each site was further characterized into individual insertion and deletion types for each species (Fig.1a). Notably, the 1-bp insertions represent one of the most common occurring mutation types, as previously observed in human cell lines. In Arabidopsis, 1-bp insertions were the most prevalent mutation types, accounting for an average of 44.6% of all mutations across 26 CRISPR sites (Fig.1a). For Setaria viridis, the average 1-bp insertion rate appeared to be the 4th highest at 9.6% across 33 CRISPR sites (Fig.1a).

a CRISPR-Cas9 induced mutation profiles across 59 target sites in Arabidopsis (n=26) and Setaria (n=33). X-axis represents individual indel sizes. The normalized mutation rates (Y-axis) were determined by dividing the number of reads containing mutations within each indel size by the total number of reads containing all types of mutations. The horizontal bars within boxes represent medians. The top and bottom edges of the boxes represent the 75th and 25th percentiles, respectively. The upper and lower whiskers extend to data no more than 1.5x the interquartile range from the upper and lower edges of the box, respectively. b, c Two-sided Pearson correlation analysis were performed using scatter plots to compare predicted versus experimentally observed insertion (ins.) rates for each CRISPR gRNA in Arabidopsis (n=26) and Setaria (n=33). The 95% confidence interval (CI) were indicated with gray color. The source data are provided in the Source Data file.

Simultaneously, the predicted mutation profile was generated for each target site using FORECasT and InDephi. In this study, we chose to focus on the insertion rates for correlation analyses on the predicted versus observed values for the following reasons: (1) CRISPR-induced insertions appeared to exhibit less stochastic patterns than deletions; and (2) previous studies have suggested that these prediction tools demonstrate greater predictive power for insertions compared to other indel types11,12,14. As a result, we observed no positive correlations using either FORECasT or InDelphi for both Arabidopsis and Setaria datasets (Fig.1b, c). Weak negative correlations were observed in the Arabidopsis dataset (r=0.56, p<0.0031 and r=0.4, p<0.036; Fig.1b), while no correlation was found in the Setaria dataset (r=0.18, p<0.31 and r=0.07, p<0.69; Fig.1c). Thus, our data suggested that both prediction programs developed with human datasets exhibited low predictive power for the CRISPR-Cas9-induced mutation profile in plants.

The limited predictive power from the human cell-based indel prediction tools prompted us to further examine CRISPR-Cas9-induced insertion profiles in plants. Both FORECasT and InDelphi predicted CRISPR-induced insertions primarily as 1-bp insertion events occurred at the 4th position upstream of the PAM; and most of these insertions were derived from templated insertions by duplicating the 4th nucleotide. When we analyzed the observed insertions from the Arabidopsis and Setaria target sites, 1-bp insertions were consistently predominant, accounting for averagely 95.9% of insertions across all sites (Supplementary Fig.1b). However, when the 1-bp insertion patterns were plotted according to the 4th nucleotide, the observed insertions did not consistently exhibit characteristics of templated insertions in plants (Fig.2a). When the 4th nucleotide was T, the inserted nucleotide appeared to follow the templated insertion model with 78.8% and 75.7% of insertions as T in Arabidopsis and Setaria, respectively (Fig.2a). With the 4th nucleotide as A, while A remained the predominant inserted nucleotide (58.5% and 58.7% in Arabidopsis and Setaria), the fractions of other types of insertions, termed as non-templated insertions, increased substantially (Fig.2a). In cases where the 4th nucleotide was either C or G, non-templated insertions became predominant by increasing to 61.4% and 66.0% for the 4th nucleotide C, and 98.4% and 99.5% for the 4th nucleotide G in Arabidopsis and Setaria, respectively.

a Cross-species 1-bp insertion patterns to the 4th nucleotide. The 1-bp insertions were divided into 4 groups according to the inserted nucleotide for each CRISPR gRNA. The normalized 1-bp insertion (ins.) rates were calculated by dividing the number of reads containing each type of 1-bp insertions by the number of reads with all types of 1-bp insertions and were plotted to the 4th nucleotides (T, A, C, and G) for Setaria viridis (S.v.; n=33 biologically independent samples), Arabidopsis thaliana (A.t.; n=26 biologically independent samples), and the human cell line (H.s.; n=150 biologically independent samples). Data are presented as mean valuesSEM. b. The schematic workflow to compare 1-bp insertion patterns across S.v., A.t., and H.s. line through the next-generation sequencing assay. c The CRISPR targeted sequences of iPAM_T and G. The 4th nucleotide was highlighted in red with the PAM sequence underlined. d Heatmap analyses of the proportion of each inserted nucleotide type (nt) at the 4th position of the iPAM_T and G sites across S.v. (n=3), A.t. (n=3), and H.s. (n=3). The source data are provided in the Source Data file. b Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.

To compare with 1-bp insertion patterns in human cells, we analyzed the insertion profiles of 150 target sites previously reported from the human cell lines13. The results were largely consistent with the templated insertion model, showing the 1-bp insertion pattern with the 4th nucleotide duplications, while low levels of non-templated 1-bp insertions were observed at the target sites with 4th nucleotide as C or G (Fig.2a). Taken together, our observations revealed distinct 1-bp insertion patterns between plants and human cells. The 1-bp insertion profiles from plant species exhibited a higher incidence of non-templated insertions, deviating from the templated insertion model. Notably, the rates of non-templated insertions appeared to vary depending on the 4th nucleotide, increasing in the order of T, A, C, and G.

To further explore the distinctive 1-bp insertion profiles across species, we conducted direct comparisons by targeting identical CRISPR sites in Arabidopsis, Setaria, and human cell line, HEK293. This involved initially integrating the firefly luciferase gene and subsequently expressing the CRISPR-Cas9 expression cassette in the genomes of these three species (Fig.2b). We designed two CRISPR guide RNAs (gRNAs) to target overlapping sites located on opposite strands, referred to as inverted PAM (iPAM) targets, as described in previous research13 (Fig.2c). These two gRNAs, with one 4th position as T (iPAM_T) and the other as G (iPAM_G), represented the sequence contexts for the highest and lowest templated insertion rates observed in plants (Fig.2c). In Arabidopsis, CRISPR-Cas9 constructs were assembled with the firefly luciferase reporter gene in T-DNA. The resulting constructs were transformed using the floral dip-based stable transgenic approach. Three seedlings from each T1 transgenic group were collected for CRISPR mutation analysis at each target site. For Setaria viridis, a homozygous Setaria line with the firefly luciferase reporter gene integrated into the genome was obtained from previous research19. Individual CRISPR-Cas9 constructs were then transformed into protoplast cells isolated from the luciferase gene-containing plants. Transformed protoplasts were collected after 48h for the mutation assay with 3 replications for each target site. When insertion rates were examined, both CRISPR gRNAs induced substantial 1-bp insertions ranging from 33.8% to 89.0% for the iPAM_T site and from 33.4% to 74.4% for the iPAM_G site in three species (Supplementary Fig.2).

Next, we analyzed templated versus non-templated insertion patterns at each target site. In the HEK293 cells, consistent with the templated insertion model, templated insertions were predominantly presented at both target sites with rates of 97.0% and 84.8%, respectively (Fig.2d). However, in Arabidopsis and Setaria, predominant templated insertions were primarily observed at the iPAM_T site, ranging from 73.3% to 87%. At the iPAM_G site, non-templated insertions were predominant, accounting for 72.6% to 95.8% of 1-bp insertions in both plant species (Fig.2d). Taken together, these findings corroborated the observations from 59 individual target sites, revealing distinct plant-specific 1-bp insertion profiles. These profiles exhibited either templated or non-templated dominant patterns associated by the 4th nucleotide upstream of PAM.

As indicated by prior studies, both epigenetic and genetic factors could influence CRISPR-Cas9 induced mutation profiles15,17,20. To explore the mechanism underlying the distinctive 1-bp insertion profiles in plants, we investigated the impact of the chromatin states on these insertions. We used the multi-copy CRISPR target site (MCSite) system previously developed in Arabidopsis17. Two sets of MCSites, designated as MCSite_T and MCSite_G based on their 4th nucleotide, are located in diverse epigenetic contexts as described previously17. Individual sites within each MCsite family can be categorized into two major groups as either open and unmethylated or closed and methylated chromatin (Fig.3a, b).

Normalized 1-bp insertion rates were plotted for individual MCsite_T (a) and MCsite_G (b) sites (X-axis). The 20-bp targeted sequences with 3-bp underlined PAM sequences were shown with each plot. The normalized 1-bp insertion (ins.) rates were determined by dividing the number of reads containing 1-bp insertions by the total number of reads containing all types of indel mutations. Data are presented as mean valuesSEM from three independent plants. Heatmaps in the lower panel illustrated the proportion of each inserted nucleotide type (T, A, C, G) at the 4th position of individual MCsite_T (a) and MCsite_G (b) sites. Chromatin states of individual sites were categorized into Open and Unmethylated or Closed and Methylated groups. The source data are provided in the Source Data file.

When the 1-bp insertion rates of individual MCSites were examined, variations were found across different chromatin states as previously indicated17. For MCSite_T sites, insertion rates ranged from 7.9% to 26.8%, and for MCSite_G sites, they ranged from 41.9% to 58.9% (Fig.3a, b). In contrast, heatmap analysis of the 1-bp insertion profiles revealed a consistent pattern within each MCSite family. Specifically, for MCSite_T sites, templated insertions were predominantly observed across individual sites, regardless of their chromatin states (Fig.3a). On the other hand, all individual sites within the MCSite_G family exhibited a predominant 1-bp non-templated insertion pattern across different epigenetic contexts, ranging from 94.0% to 99.8% (Fig.3b). These results suggested that chromatin states may have limited impacts on CRISPR-Cas9 induced 1-bp insertion profile.

We then investigated the genetic factors contributing to the distinct 1-bp insertion profiles in plants. Previous studies have pointed to the X-family DNA polymerase, Pol, and its homolog as pivotal players in mediating 1-bp templated insertions in human and yeast cells13,15. A single copy of the Pol homolog was identified in both Arabidopsis and Setaria genomes through sequence homology searches21. No other X-family DNA polymerases were found in plants from the homology search. Phylogenetic analyses confirmed that this plant X-family DNA polymerase exhibited a close evolutionary relationship with Pol as opposed to other members, such as DNA Pol and Terminal deoxynucleotidyl Transferase (TdT) (Supplementary Fig.3 and Supplementary Data2).

To explore the involvement of the plant Pol homolog in CRISPR-Cas9 induced 1-bp insertions, we obtained an Arabidopsis T-DNA knock-out mutant line (atpol-1), previously characterized with no notable growth or physiological defects22,23. Using the wild type and the homozygous atpol-1 mutant Arabidopsis plants, we generated stable transgenic plants with the CRISPR-Cas9 T-DNA construct to target three distinct sites: the single-copy site in the Arabidopsis Cheletase I2 gene (AtCHLI2), as well as the MCSite_T and MCSite_G sites. Three T1 CRISPR-Cas9 transgenic plants from each genotype were used to survey CRISPR-induced mutations for each target site. The single-copy CHLI2 site would allow for a rapid assessment of the involvement of Pol in 1-bp insertions, while the two MCSites provided additional insights in different epigenetic contexts.

When we examined CRISPR-Cas9 mutagenesis at the CHLI2 site, both wild-type and mutant CRISPR-Cas9 plants displayed comparable overall mutagenesis rates, averaging 38.9% and 37.9%, respectively (Fig.4a). In wild-type plants, approximately 25.3% of indel mutations were identified as 1-bp insertions at the 4th position, with non-templated insertions being predominant at a rate of 65.2%, attributable to the G nucleotide at the 4th position in the CHLI2 site (Fig.4b; Supplementary Fig.4a). In contrast, in Pol mutant plants, the 1-bp insertion rates, encompassing both non-templated and templated insertions, were reduced to undetectable levels (0.2%; Fig.4b; Supplementary Fig.4b). Additionally, we explored the potential involvement of this Pol homolog in CRISPR-Cas9 induced deletions. As a result, we observed similar levels of deletions within three different deletion groups, 1-bp, 2 to 10-bp and more than 10-bp, between the wild-type and mutant plants (Fig.4c). Thus, the plant Pol homolog appeared to be the pivotal gene for CRISPR-Cas9 induced 1-bp insertions, operating in both templated and non-templated manners, with limited involvement in deletions.

a CRISPR-Cas9 mutation (mut.) rates between the wild type and atpol-1 mutant plants. The mutation rates (Y-axis) were determined by dividing the number of reads containing indel mutations by the total number of NGS reads. b Normalized 1-bp insertion rates between the wild type and atpol-1 mutant plants at the CHLI2 site. The normalized 1-bp insertion (ins.) rates were determined by dividing the number of reads containing 1-bp insertions by the total number of reads containing all types of indel mutations. c. Normalized deletion rates between the wild type (WT)and atpol-1 mutant plants at the CHLI2 site. The normalized proportion of deletion (Prop. of Del. as Y-axis) were determined by dividing the number of reads containing deletions within each category (1-bp, 2-10bp, or >10bp) by the total number of reads containing all types of deletions. d, e Normalized 1-bp insertion (ins.) rates between the wild type and atpol-1 mutant plants at the MCsite_T (d) and MCsite_G (e) sites. Heatmaps under the bar plots illustrate the proportion of each inserted nucleotide type (T, A, C, G) at the 4th position of individual MCsite_T (d) and MCsite_G (e) sites. Data are presented as mean valuesSEM from 3 independent plants. P-values were derived from unpaired one-tailed Students t test. The source data are provided in the Source Data file.

Furthermore, we investigated the role of this Pol homolog at additional CRISPR target sites within diverse epigenetic contexts. When examining the 1-bp insertion rates at the MCSite_T and G sites, we observed significant reductions of 1-bp insertions, both templated and non-templated, across all sites, irrespective of their chromatin states. In the MCSite_T sites, the 1-bp insertion rates decreased from an average of 19.5% in wild-type plants to 1.6% in the mutant plants, while in the MCSite_G sites, the rates were reduced from an average of 49.4% to 1.8% (Fig.4d, e). These results substantiated that the plant Pol homolog is responsible for both templated and non-templated 1-bp insertions regardless of chromatin states.

Next, we hypothesized that overexpression of Atpol could restore or even enhance the 1-bp insertion rates. To test this hypothesis, we generated stable transgenic plants by overexpressing the AtPol gene in the atpol-1 mutant plants. The AtPol coding sequence was driven under the constitutive Arabidopsis Ubiquitin-10 promoter and cloned into the final construct with a CRISPR-Cas9 expression cassette to target the CHLI2 and MCSite_T sites. Three T1 CRISPR-Cas9 transgenic plants with the atpol-1 mutant genotype were used to survey CRISPR-induced mutations for each target site. When 1-bp insertions were examined at the CHLI2 site, the AtPol overexpression plants exhibited a 1.6-fold increase compared to wild-type plants, with an average rate of 39.8% (Fig.5a). The 1-bp insertion profiles appeared similar between the AtPol overexpression and the wild-type plants, with non-templated insertions still being predominant at an average rate of 74.8% (Fig.5a). When examining the 1-bp insertions at the MCSite_T sites, overexpression of the AtPol transgene in the mutant plant appeared to restore 1-bp insertion rates to the levels observed in wild-type plants at five of seven MCSite_T sites. At the other 2 sites, sites 1 and 4, the 1-bp insertion rates exhibited substantial increases by 1.4 to 1.6 folds, respectively (Fig.5b). When comparing the 1-bp insertion profiles, similar insertion patterns were observed between the overexpression and wild-type plants with predominant templated insertions across nearly all the sites except for one site, site 8 (Fig.5b). These results confirmed that overexpression of AtPol could restore or may enhance CRISPR-Cas9 induced templated and non-templates 1-bp insertions in the knockout mutant plants, further validating its pivotal role in generating 1-bp insertions.

Normalized 1-bp insertion rates at CHLI2 (a) and MCsite_T (b) among three lines: wild-type plants(WT), Pol overexpression plants in the atpol-1 mutant(atpol OE), and Pol overexpression plants in the wild-type backgrounds(WT OE). The normalized 1-bp insertion rates (Y-axis) were determined by dividing the number of reads containing 1-bp insertions by the total number of reads containing all types of indel mutations. Heatmaps under each plot illustrated the proportion of each inserted nucleotide type (T, A, C, G) at the 4th position. Data are presented as mean valuesSEM from 3 independent plants. P-values were derived from unpaired one-tailed Students t test. The source data are provided in the Source Data file.

We further hypothesized that overexpression of this gene should have the potential to enhance 1-bp insertions in wild-type plants. To test this idea, we introduced the same overexpression construct to wild-type plants. Three T1 CRISPR-Cas9 transgenic plants with the wild-type background were used to survey CRISPR-induced mutations for each target site. At the CHLI2 site, we observed a similar increase in the 1-bp insertion rate between the overexpression wild-type plants and the overexpression mutant plants compared to the wild-type control plants (Fig.5a). At the MCSite_T sites, when comparing the 1-bp insertion rates between the overexpression wild-type plants and the wild-type control plants, we observed substantial increases in all seven sites by 1.2 to 2.0 folds (Fig.5b). When comparing the 1-bp insertion profiles, similar insertion patterns were observed between the overexpression wild-type plants, the overexpression mutant plants, and the wild-type control across all the sites, irrespective of their epigenetic states (Fig.5b). Taken together, these observations corroborated that overexpressing the Pol homolog in wild-type plants could further increase 1-bp insertions.

To gain insights into the mechanism(s) underpinning the distinct properties of Pol across species, we conducted protein sequence analyses by aligning AtPol with X-family DNA Polymerases in humans (Supplementary Fig.5). Previous studies have indicated two conserved motifs in human X-family DNA Polymerase that contribute to template dependency24. The first motif, identified as GSYRRG in template-dependent human DNA polymerases , features two amino acids, serine and tyrosine (SY), which are replaced by glycine and phenylalanine (GF) in the template-independent human TdT (Fig.6a and Supplementary Fig.5)24,25. The second motif, known as the YF motif, contains tyrosine and phenylalanine at the catalytically active sites of the DNA polymerases . In contrast, these two residues are changed to glycine and tryptophan (GW) in TdT (Supplementary Fig.5)24,25. When analyzing these motifs in DNA polymerase homologs from Arabidopsis, Setaria, Tobacco, and rice, the first motif was identical to the sequences in human Pol, while the second motif, characterized by alanine and tryptophan (AW), showed a closer resemblance to the GW motif found in human TdT (Fig.6a and Supplementary Fig.5). Thus, the plant Pol homologs appear to combine characteristic motifs from human Pol and TdT.

a Sequence alignment of two conserved motifs, SY and YF, across Human Pol, AtPol, SvPol, and human TdT. b Comparisons of templated versus non-templated insertion rates between the wild type AtPol and two variants, PolS366G/Y367F and PolA459Y/W460F at the CHLI2 site. The templated (indicated by orange) or non-templated insertion (indicated by green) rates (Y-axis) were determined by dividing the number of reads containing each type of 1-bp insertions by the total number of reads containing 1-bp insertions in each sample. c Normalized deletion rates between the wild type and two variants at the CHLI2 site. The normalized deletion rates (Y-axis) were determined by dividing the number of reads containing deletions within each category (1-bp, 2-10bp, or >10bp) by the total number of reads containing all types of deletions. Data are presented as mean valuesSEM from three independent plants. P-values were derived from unpaired one-tailed Students t test. The source data are provided in the Source Data file. d The proposed model for the dual activities of Pol in generating templated and non-templated 1-bp insertions. Step 1: CRISPR-Cas9 generates a blunt or staggered cut at the targeted site. Blunt-ended cleavages occur at the -3rd position upstream of the PAM (indicated by the red vertical lines) on both strands, while staggered cleavages take place with one cut at the 4th position on the non-targeted strand and the other cut at the -3rd position on the targeted strand, producing 5 1-nt overhangs. Step 2: The staggered product can be filled in by Pol with template-dependent activity. Step 3: The blunt-ended product can be processed by Pol with template-independent activity to extend 1-nt at the 3 end of each strand. After ligation and correction by c-NHEJ and mismatch repair, non-templated 1-bp insertions occur at the 4th position. Additionally, cleavage products could be processed through either perfect ligation, indicated by the curved arrowheads, or through resection to generate deletion, indicated by the purple dash lines.

The presence of both human Pol and TdT motifs could potentially contribute to the observed dual templated-dependent and independent activities in AtPol. We then hypothesized that the dual activities of AtPol could be modulated by modifying each motif individually. To test this hypothesis, we generated two variants of AtPol through site-directed mutagenesis on the respective motifs. The first variant, AtPolYF, was engineered by substituting Alanine and Tryptophan (AW) with Tyrosine and Phenylalanine (YF) at the second motif to mimic human Pol (Fig.6a). Similarly, the second variant, AtPolGF, was created to mimic human TdT by replacing Serine and Tyrosine (SY) with Glycine and Phenylalanine (YF) at the first motif (Fig.6a).

The coding sequence of each AtPol variant was cloned into the T-DNA vector described above, with the constitutive Arabidopsis Ubiquitin-10 promoter and a CRISPR-Cas9 expression cassette to target the CHLI2 site. We used an agrobacterium-mediated transient expression approach to transform individual T-DNA constructs into young seedlings of the atpol knock-out mutant, and then examined the CRISPR-Cas9 mutation profile at the CHLI2 site using the NGS assay (Supplementary Fig.6a). The average mutation rates from these samples are 17.3% (AtPolWT), 12.7% (AtPolGF) and 10.3% (AtPolYF), respectively (Supplementary6b). When analyzing templated versus non-templated 1-bp insertion patterns, the samples expressing the wild type AtPol gene exhibited higher proportions of non-templated insertions compared to those of templated insertions (57.6% non-templated insertions versus 42.4% templated insertions) consistent with the observations from the stable transgenic plants (Fig.5a, b). In contrast, the samples transformed with the AtPolYF variant demonstrated altered 1-bp insertion profiles with templated insertion proportions being significantly higher than those from the overexpression of the wildtype AtPol by 100% (86.0% versus 42.4%; Fig.6b and Supplementary Fig.6c, d). Conversely, the samples transformed with the AtPolGF variant displayed significantly higher proportions of non-templated insertions compared to the wild-type AtPol overexpression lines by 18% (67.9 % versus 57.6%; Fig.6b and Supplementary Fig.6c, d). Regarding the deletion profiles, no evident differences were observed within three different deletion groups, 1-bp, 2 to 10-bp and more than 10-bp, among AtPolWT and the two variants (Fig.6c).

Notably, the overall 1-bp insertion rates from the samples with each variant reduced to 4.4% and 5.5% compared to 31.7% in the wild-type AtPol overexpression control, suggesting the involvement of additional amino acids in regulating enzymatic activity. (Supplementary Fig.6e). Collectively, these observations align with our hypothesis that these two conserved motifs play crucial roles in modulating the dual template-dependent and independent activities of AtPol. Further investigation is required to refine the enzymatic activities of these variants.

See the original post here:
Dual activities of an X-family DNA polymerase regulate CRISPR-induced insertional mutagenesis across species - Nature.com

Related Posts