Knock-out of a GFP transgene
The first test of the CRISPR system in soybean was with a GFP (Green Fluorescent Protein)-expressing soybean line, as GFP knock-outs are easily observed by a loss of fluorescence. Two GFP-targeting gRNA vectors were designed; one gRNA was designed to target the 5′ end of GFP (5′-target) and a second was designed to target the 3′ end (3′-target) (Figure 1A). The vectors were introduced into the GFP line via A. rhizogenes to produce hairy roots. Fifteen out of 17 5′-target events and four of the 22 3′-target events were knock-outs as evident by a loss of fluorescence under blue-light (Additional file 1). Controls containing either Cas9 or the gRNAs alone, all fluoresced (Additional file 1). Since the GFP soybean line used is homozygous for GFP, these results show that the CRISPR system is able modify both GFP alleles, which is the only way to get loss of fluorescence.
Custom-amplicon sequencing was used to determine the genetic modifications at the GFP transgene. The most abundant mutations at the 5′-target were short (1-21-nt) deletions (Figure 1, Additional file 2). For event 10, a wild-type sequence was observed in 16% of the reads, which is consistent with fluorescent imaging (Figure 1 and Additional file 1). The 3′-target is less efficient; wild-type sequences were observed in seven of the events, with one event being completely unmodified (Additional file 2). Events with wild-type and modified sequences may be due to a single GFP allele being modified, or to the presence of chimeric tissues. Four of the 3′-target events contained SNPs and one event contained a T insertion, whereas the 5′-target events did not contain any SNPs or insertions. A single SNP at the 3′-target was routinely observed in the modified events and Cas9 control and may be due to errors during library preparation or sequencing.
Modifying a soybean gene
Given the successful modifications of the GFP targets, the next attempt was to modify the single-copy soybean gene, Glyma07g14530, which is a putative glucosyl-transferase. Glyma07g14530 custom amplicons from ten independent events were sequenced, and these showed a variety of mutations, including deletions, SNPs, insertions, and replacements (Additional file 2). Replacements are defined as two or more bases that were incorporated after a deletion event. Three events contained only modified sequences, six events had both wild-type and modified sequences, and one event had no modifications. These results indicate that both mono- and biallelic modifications were made and/or chimeric tissues were present.
Targeting gene pairs
Soybean is a paleopolyploid [23] and thus most genes have a homoeolog. For functional genomic studies, it would be beneficial if the CRISPR system could be used to target a homoeologous gene-pair singly and at the same time. To test this, the soybean genes Glyma01g38150 and Glyma11g07220 (orthologs of the A. thaliana DDM1 gene) were targeted. Three gRNAs were designed; one to target Glyma01g38150 (01gDDM1), one to target Glyma11g07220 (11gDDM1), and a third to target both (01g + 11gDDM1). Both single-targeting gRNAs resulted in average indel frequencies greater than 70% (Figure 2). For 01gDDM1, eight events had indel frequencies between 87-97%. Two events only had indel frequencies of 1-2%, but these were still higher than the Cas9 control (0.14%). All but one of the 11gDDM1 events had indel frequencies greater than 95% (Figure 2). The 01gDDM1 gRNA was specific for the intended chr1 target, but the 11gDDM1 gRNA led to a small but detectable level (2-13%) of off-target modifications at the chr1 sequence (Figure 3).
Genetic modifications at both DDM1 genes were detected in events containing the 01g + 11gDDM1 gRNA, but the average indel frequency was only 21% for chr1 and 8.9% for chr11 (Figure 2). Average indel frequencies greater than 97% were observed in events targeting a different homoeologous gene pair Glyma04g36150 and Glyma06g18790 (A. thaliana MET1 orthologs), suggesting that the lower indel frequency of the 01g + 11gDDM1 vector is due to the gRNA itself and not a result of targeting multiple genes at once.
It is noteworthy that unique insertions of the A. rhizogenes root-inducing (Ri) plasmid [GenBank: AJ271050] were present in two 11gDDM1 events. The Ri insertions were identified in 4.8% of the reads from event 3 and 79.2% of the reads from event 4. Both insertions are from the left-border end of the Ri plasmid, approximately 1 kb apart from each other. Cloning and sequencing of event 4 showed a 252-bp insertion from the Ri plasmid (Additional file 3). These results are particularly interesting since it should be possible to increase the chances of obtaining targeted insertions, as has been shown with other nuclease systems [24-27].
Targeting MIR genes
MicroRNAs (miRNAs) are small RNA molecules responsible for regulating a wide range of processes in plants [28]. MicroRNAs are encoded by MIR genes that are typically short (~500 bp), non-coding sequences. These features, coupled with the genetic redundancy of MIR families, may decrease the likelihood of isolating MIR mutants in mutagenesis screens [29]. Thus, the specific targeting of Cas9, and the large number of targets for any given gene, may make the Cas9 system well suited for generating MIR mutants. Two soybean miRNAs, miR1514 and miR1509 were targeted with Cas9. The short length of the MIR genes limited the number of possible Cas9 targets. Finding a MIR1514 target near the mature miRNA was particularly difficult. Since mismatches are tolerated on the 5′ end of the gRNA [13], a C to G mismatch between the target and gRNA was made on the 5′ base (Figure 2) to get a target close to the mature miRNA. Indel frequencies greater than 95% were observed in all four miR1509-, and three out of four miR1514-targeted events. None of the short deletions (1-16 bp) were within the mature miRNA sequences, thus, none of the mutations are expected to alter the production of the miRNAs. However, these results demonstrate that short, non-coding sequences, such as MIRs, can be readily targeted by the CRISPR/Cas system.
Genetic modification of somatic embryos
Hairy roots are an excellent transgenic model system for soybean, however, they cannot generate whole plants, and therefore heritable mutations cannot be made. To evaluate CRISPR mutagenesis in whole plants, somatic embryo cultures of soybean were biolistically transformed with Cas9 constructs. Eight Glyma07g14530 and 24 01g + 11gDDM1 hygromycin-resistant events were recovered. Although each event contained portions of the gRNA and Cas9 genes as determined by PCR (data not shown), only two Glyma07g14530 and three 01g + 11gDDM1 events contained a complete Cas9 gene as determined by long-distance PCR (Figure 4A). When hairy-root events (Agrobacterium transformation) were screened, a full Cas9 product was observed in all ten events (Additional file 4A). These results suggest that the Cas9 gene fragmented during biolistic-mediated transformation, but not upon Agrobacterium-mediated transformation.
As with other Cas9 systems [10], the continued activity of Cas9 in the somatic embryos resulted in additional genetic modifications. DNA samples were taken from all events once there was enough tissue, approximately 2-4 weeks after selection, and used for amplicon sequencing. At this first sequencing time-point, event 24 had approximately 2.5 % modified sequences on chr1 and chr11, whereas events 10 and 21 had none. Although individual modified sequences made up fewer than 1% of the reads in event 24 (Additional file 2), such deletions were not observed in any of the other 23 events sequenced, indicating that these deletions were not due to sequencing errors. When DNA was collected approximately two weeks after the first sequencing experiment, the indel frequency increased to 4.3% in event 24. Events 10 and 21 had 20% and 4-5% modified sequences, respectively, for both targets (Figure 4B).
The two Glyma07g14530 events did not survive tissue culture and no modifications were detected in DNA from somatic embryos (data not shown). Individual embryos from event 24 range in indel frequency from 0-14%, with most of the events at 4% (Figure 4C). Therefore continued expression of Cas9 leads to additional mutations during the development of these embryos.
Mutation efficiency
Of the nine targeting vectors used in this study, seven resulted in average indel frequencies greater than 70% (GFP 5′, 01gDDM1, 11gDDM1, Glyma04g36150, Glyma06g18790, miR1509, and miR1514). This mutation efficiency is ten-fold higher than the 3-7 % obtained with transcription-activator like effector nucleases (TALENs) in soybean hairy roots [30].
In hairy roots, the 01g + 11gDDM1 vector had the lowest average, with 21% and 8.9% for the chr1 and chr11 targets, respectively. A similar frequency was observed in the somatic embryos (Figure 4B, C). It should be noted that the 01g + 11gDDM1 gRNA is one base shorter than the rest of the gRNAs in this study (GN19GG). However, this target length has been used in plants [31], and shorter gRNAs (GN18GG) have been shown to be as effective as the commonly used gRNA (GN20GG) in cultured human cells [32]. It seems unlikely that a shorter gRNA led to a decrease it indel frequency, but a thorough testing of gRNA lengths in plants has not been reported. Although each of the vectors had a range of indel frequencies, only four out of 88 (5%) hairy-roots were unmodified, demonstrating that CRISPR mutagenesis in soybean is a robust system.
The three 01g + 11gDDM1 somatic-embryo events with the complete Cas9 gene contained targeted genetic modifications. These were three out of 24 hygromycin-resistant lines. These data demonstrate that when the complete Cas9 is incorporated, genetic modifications are made, although the complete Cas9 gene is only incorporated in 12.5% biolistically-transformed events. Of the recent reports of CRISPRs being used in plants, several have shown the recovery of whole-plants. One publication reported the biolistic transformation of rice, in which 9.4% and 7.1% of the T0 rice plants recovered contained mutations at their respective targets [31]. In this report, the Cas9 and gRNA cassettes were located on separate plasmids, and it is unclear if the complete Cas9 and gRNA cassettes were incorporated in all events. In contrast, transgenic A. thaliana and rice plants transformed with Agrobacterium tumefaciens had efficiencies of 20-90% for several targets [6,7,9,33]. Our data suggest that the disparity between biolistic and Agrobacterium-mediated transformation could be due to incomplete incorporation of the complete Cas9 gene upon biolistic-mediated transformation.
Types of mutations
The types of mutations obtained here are similar to those observed in soybean and other plants obtained with ZFNs [15,21], TALENs [17,18,30] and CRISPRs [31,33-35]; small deletions were the most frequent mutations; SNPs were less common (Additional file 2).
The different targeting sequences tested led to a distinctive gamut of mutations. The seven most effective vectors almost exclusively generated short deletions, whereas the lower efficiency vectors contained more insertions/SNPs (Additional file 2). Of the ten 07g14530 events, seven had insertions of one or more bases. These results suggest that the differences were determined by either the target sequence or the gRNA. Therefore, multiple targeting vectors may be needed for any potential target sequence, depending on the frequencies/types of mutations desired. Obtaining a greater variety of mutations may be desirable when the intent is to produce an allelic series.
The types of mutations between the hairy-root events and somatic embryos are consistent between chromosomal targets and between transformation methods. Within the ten 01g + 11gDDM1 hairy-root events, six contained an A insertion on chr1 at the same position. From those same ten events, five contained an A insertion on the homoeologous target on chr11 (Additional file 2). Each of the somatic-embryo events has the same A insertion for both chr1 and chr11, and in many cases, it is the most abundant read (Additional file 2). Given the consistent insertion pattern, it is tempting to speculate that there may be rules governing the types of mutations that are possible for a given target.
Evaluation of off-target modifications
One limitation of the CRISPR system is the potential for off-target modifications, i.e., the modification of sequences similar to the intended target sequence [13,36]. To determine the extent to which there may be off-target modifications, putative off-target sites were identified for the Glyma07g14530, DDM1, MET1, and miR1514 vectors. Each putative off-target site has two to six mismatches relative to the gRNA (Figure 3).
Two gRNAs created off-target mutations. The 11gDDM1 chr1 off target was modified in 2-13% of the sequenced reads, which is considerably lower than the indel frequency at the intended chr11 target (95-100%). When off-targeting occurred at miR1514 18g, there was a range of frequencies; 100%, 25%, and 5%. The 07g14530-15g and -17g off-target loci had indel frequencies of 2.8% and 2.2%, respectively. However, the increased indel frequencies were also observed in the Cas9 control, showing that they were due to sequencing errors caused by long stretches of T’s in the amplicons. These results indicate that while off-targeting does occur, at least for the tested gRNAs, it is not common, and was generally at a much lower frequency than at the intended target.
gRNA vector construction
In this work, a rapid cloning method (Additional file 5) was developed to create new gRNAs. It consists of a single PCR reaction with two 41-bp primers and an In-Fusion® reaction and can be used to clone any gRNA target sequence. The pUC gRNA shuttle vector makes the construction of gRNAs simple and inexpensive. The use of the In-Fusion® cloning system has the benefit of reducing handling steps, to the point where it should be simple to automate the entire cloning process. Binary Cas9 vectors with four different selectable makers (nptII, GFP, hygromycin, bar) were also created to facilitate plant transformation experiments.