Integration, abundance, and transmission of mutations and transgenes in a series of CRISPR/Cas9 soybean lines
BMC Biotechnology volume 20, Article number: 10 (2020)
As with many plant species, current genome editing strategies in soybean are initiated by stably transforming a gene that encodes an engineered nuclease into the genome. Expression of the transgene results in a double-stranded break and repair at the targeted locus, oftentimes resulting in mutation(s) at the intended site. As soybean is a self-pollinating species with 20 chromosome pairs, the transgene(s) in the T0 plant are generally expected to be unlinked to the targeted mutation(s), and the transgene(s)/mutation(s) should independently assort into the T1 generation, resulting in Mendellian combinations of transgene presence/absence and allelic states within the segregating family. This prediction, however, is not always consistent with observed results.
In this study, we investigated inheritance patterns among three different CRISPR/Cas9 transgenes and their respective induced mutations in segregating soybean families. Next-generation resequencing of four T0 plants and four T1 progeny plants, followed by broader assessments of the segregating families, revealed both expected and unexpected patterns of inheritance among the different lineages. These unexpected patterns included: (1) A family in which T0 transgenes and mutations were not transmitted to progeny; (2) A family with four unlinked transgene insertions, including two respectively located at paralogous CRISPR target break sites; (3) A family in which mutations were observed and transmitted, but without evidence of transgene integration nor transmission.
Genome resequencing provides high-resolution of transgene integration structures and gene editing events. Segregation patterns of these events can be complicated by several potential mechanisms. This includes, but is not limited to, plant chimeras, multiple unlinked transgene integrations, editing of intended and paralogous targets, linkage between the transgene integration and target site, and transient expression of the editing reagents without transgene integration into the host genome.
Modern genome engineering provides the ability to make targeted modifications to genomes. Some of the most popular systems for genome engineering involve delivering a reagent to the cell that induces a double-stranded break (DSB) at a specific DNA sequence, thereby initiating the repair/modification process. Reagent platforms include zinc-finger nucleases and TAL effector nucleases, which can each be engineered as proteins that recognize and create DSBs at specific DNA sequences. These platforms have been used to modify genes in numerous different organisms, including plant species [1,2,3,4,5,6]. More recently, CRISPR/Cas9 has become a popular genome engineering platform, and has been used across a variety of species due to its ease of construction and range of sequences it is able to target [7,8,9]. The plant research community has rapidly adopted the CRISPR/Cas9 system, including as a tool for modifying and enhancing different crop species [10,11,12,13,14,15,16,17,18]. This type of genome editing/engineering provides a toolkit for modifying DNA in a gene-specific manner, allowing researchers, geneticists, and breeders to move beyond the ordinary boundaries of germplasm and genetic variation.
In crop plant species, the majority of trait-driven editing applications have focused on creating targeted gene knockouts, with many such efforts using CRISPR/Cas9 editing reagents [10,11,12,13, 15, 16, 19,20,21,22,23]. Often, this process involves delivering a transgene to the plant genome that encodes the CRISPR guide RNAs (gRNAs) and Cas9 protein. Expression of these reagents in the T0 generation can generate mutation(s), which can be transmitted to subsequent generations. Moreover, the CRISPR/Cas9 transgene will in many instances not be linked to the mutation(s). Therefore, the breeder/geneticist can specifically select for segregating individuals in the subsequent generation that carry the desired mutated allele and no longer harbor the transgene.
In soybean, there are two main methods to create stable transgenic plants: Agrobacterium-based methods and biolistics. Agrobacterium-mediated transformation uses specific strains of either Agrobacterium rhizogenes or A. tumerfacians as a means to deliver a vector containing a transgenic DNA (T-DNA) cassette into the soybean host [24,25,26,27]. Biolistics is a direct gene transfer mechanism that uses high-velocity microprojectiles to introduce foreign DNA into tissues, resulting in non-homologous integration of transgenic DNA into the genome [28,29,30,31,32,33,34].
Soybean genes have been successfully modified using CRISPR/Cas approaches in both somatic and germline transmissible cells and for a variety of agronomic traits [35,36,37,38,39,40,41,42]. One recent study  carefully tracked the transmission of mutations and transgenes from T0 soybean plants to the next generation. In this study, Agrobacterium was used to transform CRISPR/Cas9 into whole soybean plants to knockout genes involved in small RNA pathways. Curtin et al.  targeted three genes, GmDrb2a, GmDrb2b and GmDcl3a and generated mutations at each target site in the T0 generation.
The GmDrb2 CRISPR construct used two guide-RNAs that each recognized both GmDrb2ba and GmDrb2b loci. The resulting transformation yielded two T0 plants derived from the same cluster of cells. From these two events, Curtin et al.  detected four small deletions at the GmDrb2a locus that were in common for both transgenic events. Screening of the GmDrb2b locus reveled two small deletions shared between the transgenic events and a 6 bp deletion unique to one of the T0 plants. Using next-generation sequencing, they identified three separate transgenic insertion events in the same locations for both T0 plants. After self-pollinating the T0 plants to the T1 generation, PCR screening for mutations revealed that only two of the four small deletions at GmDrb2a were transmissible. Similarly, only two of the three small deletions at the GmDrb2b locus were transmissible. Further analysis of each of the three transgenic insertions in the T1 revealed that each locus was transmissible.
Meanwhile, a different CRISPR/Cas9 construct was designed to target GmDcl3a . Analysis of the GmDcl3a CRISPR mutations in two separate T0 plants identified a total of three different small deletions and one small insertion at the target site. PCR screening and next-generation sequencing of the T0 plants revealed a single transgenic insertion event in one of the plants and no evidence for transgenic insertion in the other (the latter of which was corroborated by sequence data). The authors then analyzed 60 T1 plants from each event and failed to identify any transmitted mutations or transgene integration events in either lineage.
The inconsistent transmission of mutations and transgenes observed among the soybean CRISPR/Cas9 lines in Curtin et al.  is based on a small number of plants/events. Therefore, in this study, we sought to expand upon this work by investigating more lines to identify expected and/or novel outcomes. We sequenced four T0 parents and four offspring of transgenic CRISPR/Cas9 lines to study the effects of CRISPR/Cas9 at gRNA target sites, as well as variation induced due to transgenic insertion events into the genome. The transformed lines studied in this experiment demonstrate a range of potential outcomes of CRISPR/Cas9 mutagenesis in soybean using an Agrobacterium-mediated transgenesis system.
Identification of CRISPR mutations at target sites in T0 plants
Three separate whole-plant transformation (WPT) series named WPT536, WPT553, and WPT608 were generated using the expression vectors diagramed in Fig. 1. Each vector used a constitutive promoter (Gmubi or Cauliflower mosaic double 35S [43, 44]), a Cas9 endonucleases (Soybean codon optimized  or Arabidopsis thaliana codon optimized ), single or double gRNA cassette  driven either by the A. thaliana U6 or 7sL promoter, and a gene encoding resistance to either Glufosinate (BAR) or Hygromycin (Fig. 1, in Additional file 1: Table S1). Guide-RNA cassettes were constructed and inserted into each WPT destination vector. WPT536 and WPT553 each targeted a single locus on one gene model, Glyma.16 g090700, and Glyma.18 g041100, respectively (Table 1). WPT608 included two gRNAs targeting gene model Glyma.16G209100. One of these gRNAs had a perfect match to the target site on Glyma.16G209100 and nearly a perfect match to its paralog gene model Glyma.09G159900 (it had a 1 bp mismatch 16 bp from the PAM site). The other gRNA for gene model Glyma.16G209100 failed to result in mutations and is not further discussed below. Each destination vector was transformed into the background Bert-MN-01, and DNA was extracted from putatively transformed T0 plants.
PCR-based gel assays (as described in ) were used to screen for mutations at the intended sites for each T0 plant. Four T0 plants were identified with putative mutations, one each from the WPT536 (individual WPT536–2) and WPT553 (individual WPT553–6) series, and two from the WPT608 series (individuals WPT608–1 and WPT608–3). Sequencing of PCR amplicons at each of the target sites for these four T0 plants revealed mutations (details are provided in the sections below). These four plants and some of their progeny were tracked for the inheritance of the targeted mutations and transgene integration loci.
WPT536–2: expected transmission and segregation patterns from single transgene and mutation events
WPT536–2 was a T0 plant transformed with a Gmubi-driven Glycine max codon-optimized Cas9 and a single gRNA targeting Glyma.16 g090700 (herein known as GmRin4b). PCR confirmed the presence of the Cas9 and plant-selectable marker (in Additional file 2: Fig. S1), indicating successful transformation of the construct. Sequencing of a PCR amplicon from the gRNA target site revealed a 2 bp deletion. Whole genome sequencing (WGS) of the T0 plant confirmed the previously identified 2 bp deletion along with evidence of a 1 bp insertion at the target site (Fig. 2 a, in Additional file 2: Fig. S2).
Furthermore, WGS revealed a single CRISPR/Cas9 transgene integration site localized to an interval on chromosome 11 (Fig. 2b, in Additional file 2: Fig. S3). The interval had a 35 bp hemizygous deletion and a 4 bp addition flanking one side of the transgenic insertion (Table 2). The reads spanning the genome into the transgene appear to suggest a complete cassette between the right border (RB) to left border (LB) was inserted within the deleted region. Given the presence of both a transgene and mutation, the generation of this plant was renamed T0/M0.
PCR screening of GmRin4b mutations in the segregating T1/M1 and T2/M2 generations revealed germline transmission of the transgene. However, these assays revealed that four out of 27 T1/M1 plants no longer carried the transgene (such plants can simply be identified as M1 progeny, as they do not have the transgene). Confirmation of this result for the M1 plant WPT536–2-13 and its M2 progeny is shown in Additional file 2: Fig. S1. WGS was performed on two M2 progeny from WPT536–2-13 (plants WPT536–2–13–15 and WPT536–2–13–16). To further validate that there was no trace of transgenic DNA, reads from WGS were mapped directly to the transgene for each sequenced plant (in Additional file 2: Fig. S4). Only the T0 parent had consistent coverage across the transgene, while the progeny plants lacked any reads mapping to the transgene except for the Gmubi promoter, which can be attributed to the natural ubiquitin promoter sequences located in the soybean genome. Furthermore, the WGS revealed that the M2 plant WPT536–2–13–16 retained the 2 bp mutation at the CRISPR target site while plant WPT536–2–13–15 segregated back to homozygosity for the wild-type allele. Given these findings, it was determined that plant WPT536–2–13–16 is a simple M2 generation plant (contains a mutation, but no transgene), while plant WPT536–2–13–15 is neither a transgenic or mutant individual. This segregation represents expected Mendelian outcomes, wherein the respective transgenic and mutated loci could be selected for or against in subsequent generations.
WPT608–1: T0 transgenes and mutations were not transmitted to progeny
Gene models Glyma.16G209100 and Glyma.09G159900 were targeted by CRISPR/Cas9 using a construct nearly identical to that used by Curtin et al. , with the only modification being the gRNA target site. PCR screening revealed that two lines, WPT608–1, and WPT608–3, had evidence for mutations at recognition sites on chromosomes 9 and 16 from a single gRNA, as well as evidence of transgene integrations into the genome. WGS of 608–1 confirmed the presence of a 1 bp insertion and two different 4 bp deletions as seen by PCR (Fig. 3a). Furthermore, an additional target site on the paralogous gene model Glyma.09G159900, which has an identical gRNA recognition site, also showed evidence for mutation, as 20% of the T0 reads had a 4 bp deletion at the target site (Fig. 3a).
WGS identified a single transgene integration site on chromosome 17 for WPT608–1 (Fig. 3b). The T-DNA segment induced a 1 bp deletion at the transgene integration site with a 9 bp insertion flanking the transgenic segment (Table 2). Reads that spanned the genomic-transgene junction revealed that a portion of the right border inserted itself into that location. The transgenic sequences at the left junction were undetectable due to the lack of any chimeric reads aligning to that segment of the genome.
PCR assays could not detect any presence of mutations or the transgene in the T1/M1 generation among 22 tested plants, suggesting that neither the mutations nor the transgenic insertion event were germline transmissible from WPT608–1 (in Additional file 2: Fig. S5). Therefore, the WPT608–1 event appears to most likely be an instance where the T0 plant was chimeric, and the transgenic/mutated sector did not produce seeds. Alternative hypotheses may also explain this outcome, such as the transgenic and mutated sequences originated from different sectors, with the mutations being driven by transient expression of the reagents. In any case, mutations appear to have been produced in some somatic cells of the T0 plant but did not reach the germline.
WPT608–3: mutations and transgene integrations at the CRISPR target sites
WGS of 608–3 revealed four separate transgene insertion events on chromosomes 6, 9, 16, and 18 (Fig. 3c, in Additional file 2: Fig. S6). The event on chromosome 6 induced an 8 bp deletion in the host genome while inserting 3 and 20 bp additions on either side of the transgene integration site (Table 2). Analysis of the reads spanning the genomic/transgene junctions suggests that there was a partial insert of half of the transgene from the RB to halfway through the cassette. The transgenic insertion event on chromosome 18 deleted 3 bp of the host genome and created a more complex transgenic insertion event. The transgenic sequence detected on the left junction was in the antisense orientation while the sequence on the right junction was in the sense orientation, suggesting that there were multiple insertions/rearrangements of the transgene at that location (Fig. 3c).
The transgene integration site on chromosome 16 was observed within the CRISPR gRNA target site on gene model Glyma.16G209100 (in Additional file 2: Fig. S7). The sequenced regions flanking the transgene integration site indicated that 1 bp of the host genome was deleted while inserting a full transgene cassette. Furthermore, the transgene integration site on chromosome 9 was also observed within a CRISPR gRNA target site on the paralogous gene model Glyma.09G159900 (in Additional file 2: Fig. S8), and that it created a 10 bp deletion in the host genome. There was also a 11 bp insertion flanking the sequence of one end of the chromosome 9 transgene integration site (Table 2). Reads spanning the junctions of both the chromosome 9 and chromosome 16 events suggest that a full transgene cassette was inserted into both locations.
All six of the tested WPT608–3 T1 progeny showed inheritance of the transgene integration event at the Glyma.16G209100 locus (in Additional file 2: Fig. S6). Two of the six progeny (plants WPT608–3-2 and WPT608–3-3) were homozygous for this transgene integration event (in Additional file 2: Figs. S6 and S9). PCR and sequencing assays for two other WPT608–3 T1/M1 progeny (plants WPT608–3-1 and WPT608–3-5) confirmed germline transmission of the 1 bp insertion allele at Glyma.16G209100 (in Additional file 2: Fig. S9). Meanwhile, the transgene insertion at the paralogous locus Glyma.09G159900 was only inherited by four of the six progeny, and none were homozygous for this event (in Additional file 2: Fig. S6). Furthermore, all six of these plants showed evidence for inheriting the 1 bp insertion allele at Glyma.09G159900 (in Additional file 2: Fig. S9).
In summary, WPT608–3 represents a unique T0 plant in which two of the four transgene integration sites were located at the gRNA target site. Presumably, this was caused by CRISPR/Cas9 induction of double-stranded breaks at the paralogous target sites that were repaired by transgene integration during the transformation process.
WPT553–6: unresolved transgene inheritance in a line with germline mutations
The CRISPR/Cas9 construct targeting Glyma.18 g041100 (herein known as GS1) was developed as a result of a previous study and shown to be effective at generating mutations in soybean somatic hairy root tissues . We used the same construct in whole-plant transformation to generate the WPT553 series of plants for the present study. PCR screening and WGS of the WPT553–6 T0/M0 plant revealed the presence of the transgenic sequence and two different 7 bp deletions at the target site (Fig. 4a). Sequencing of the progeny plants 553–6-8 and 553–6-11 identified a 2 bp and a 6 bp mutation in the respective plants. Neither of these mutated alleles were identified in the T0/M0 parental plant (in Additional file 2: Fig. S10). Furthermore, the plant-selectable marker and the Cas9 were not detected by PCR in the 553–6-8 and 553–6-11 plants, nor were these transgene components detected in any of the 31 putative T2/M2 offspring (Fig. 4b). Aside from the 553–6-8 and 553–6-11 individuals, none of these plants showed evidence for mutations at the target site.
To help detect chimeric transgenic or mutation events, leaf tissue was pooled from different parts of plant WPT553–6, and DNA was prepared for WGS. Similar pooling strategies were also applied within each of the 553–6-8 and 553–6-11 offspring plants. Despite the PCR evidence indicating the presence of transgenic sequences (Fig. 4a), WGS analyses were not able to identify any transgene integration sites in the WPT553–6 T0 plant. Furthermore, no such integrations sites were identified in the 553–6-8 and 553–6-11 offspring. When mapping the DNA of each plant directly to the transgene (in Additional file 2: Fig. S11) only the WPT553–6 T0 plant had reads that consistently mapped to the transgene. However, the average read coverage for the transgene was far below the WPT plants described in previous sections that exhibited heritable transgenic insertion events. WGS mapping of reads to the transgene sequence for 553–6-8 and 553–6-11 respectively yielded only 7 and 1 reads that mapped (in Additional file 2: Fig. S11). Therefore, the extremely low mapping coverage to transgenic sequences observed in the WPT553 plants may be better explained by trace levels of sample contamination rather than the presence of a stably integrated transgene or due to cross-contamination due to template switching within barcoded libraries . Therefore, we speculate that the initial mutagenesis observed in the WPT553–6 T0 plant may have been derived from a non-integrated CRISPR/Cas9 transgene, which may explain the transmission of mutated alleles with minimal evidence for transmission of any transgene components.
Sequence microhomology near transgenic insertions sites
Analyses of the transgenic integration sites revealed evidence of sequence micro-homologies between the inserted transgenic DNA and the host sequences flanking the insertion. We aligned the putative host genome sequence (based on the Williams 82 reference genome), the transgene construct sequence, and the observed sequence at the transgene integration junction to identify potential sites of microhomology (Fig. 5). The junction for the T-DNA insertion typically exhibited sequence matches for three or four base-pair (bp) tracts in regions flanking the transgene integration site. For instance, the chromosome 18 integration site in plant 608–3 had a perfect match in homology between the construct and the host genome sequence in the region flanking the 5′ end of the insertion, while the microhomology on the 3′ end was shifted three bp between the host genome and transgenic sequence (Fig. 5). While the 5′ junction of 608–3 on chromosome 18 was the only instance of a perfect micro-homology match, 8 out of the 11 junctions that were detected were within 3 bp of one another, while 2 of the 11 were within 9 bp of one another. Interestingly, each microhomology sequence across all 11 junctions contained a homopolymer sequence of at least 2 bp.
Resequencing of four T0 plants and selected progeny provided a high-resolution view of transgene integration structures and gene editing events. The four T0 plants each exhibited a different outcome, though each outcome parallels similar findings in the recent crop genome editing literature. Plant WPT536–2 exhibited the most straight-forward scenario, in which a single transgene integration produced frameshift mutations at a single target site. The transgene and mutations transmitted and segregated in the progeny, as is generally the desired outcome for the majority of such experiments and has often been reported in previous studies [41, 42, 47,48,49,50,51,52,53,54,55,56].
Plant WPT608–1 exhibited evidence for a single transgene integration and targeted mutations at two paralogous loci. However, neither the transgenes nor mutations were recovered in the progeny. This type of negative result may be commonplace in genome editing projects, but it is an undesirable outcome for most projects and is likely to be unreported in scientific articles . There are different mechanisms that may explain this result, including the possibility that WPT608–1 was a chimeric plant in which the transgene and mutations were part of a sector that did not produce seeds. It is noteworthy that the DNA used to resequence plant WPT608–1 was pooled from five different leaflets growing on different branches of the plant. Perhaps only one or two branches harbored the transgene and mutations, and these failed to produce seeds. Probably the simplest explanation would be that somatic mutations were identified in the T0, however by chance and circumstance, none of the meristems that eventually produced offspring harbored such mutations. While this hypothesis remain untested, there are additional speculations and hypotheses that could be suggested to explain the observed result.
Plant WPT608–3 exhibited an unexpected phenomenon in which two paralogous CRISPR target sites were each found to harbor CRISPR/Cas9 transgenes. The process to create such loci is somewhat analogous to a previously described non-homologous end-joining strategy used to insert a specific T-DNA segment into a specific genomic locus . In this strategy, the editing reagent (e.g., the CRISPR/Cas9) is designed to simultaneously cut both the intended T-DNA segment from the transgene and the genomic target where the T-DNA is to be inserted. In effect, the released T-DNA segment acts as a donor molecule that can be integrated into the genomic target site during the double-stranded break repair. In the case of plant WPT608–3, it appears that when the full transgene was delivered to the cell it generated double-stranded breaks at the intended paralogous loci, and then copies of the transgene were used to repair the targeted double-stranded breaks. Site specific T-DNA integration in plants have been previously reported in the literature [38, 58,59,60,61,62], though it is not common and we are unaware of any examples in which two unlinked (in this case, paralogous) target sites acted as transgene integration loci in a single cell. Importantly, all four transgenic loci in the T0 plant were shown to segregate in subsequent generations. Furthermore, a simple frameshift allele for gene model Glyma.16G209100 was also shown to segregate in these generations. Therefore, a researcher could select for progeny that specifically carry the frameshift allele and no longer harbor the transgenes, if such an outcome is desired.
Plant WPT553–6 exhibited a unique outcome in which the T0 plant exhibited the presence of mutations at the targeted locus (Glyma.18 g041100), however resequencing data could not confirm integration of the CRISPR/Cas9 transgene. Analysis of progeny indicated that a small number of plants (two out of 31) carried mutations, while none of the plants harbored the transgene. On the surface, this appears to be a highly favorable outcome, as transmissible mutations were recovered in an apparently non-transgenic background. However, this may be a difficult result to reproduce, as it would seem to require transient expression of the transgene without integrating into the host genome, thereby generating mutations in a non-transgenic background. Zhang et al.  reported a purposeful identification of such plants in wheat, wherein the authors specifically screened plants bombarded with CRISPR/Cas9 constructs for individuals carrying mutations and no transgenes . This process was able to identify plants of this type but required extensive screening of large populations to identify these rare events. In the case of WPT553–6, it is also possible that the transgene did insert stably into the genome, but was located in a region of the genome difficult to map and/or a structurally rearranged T-DNA was inserted such that it was not detected by PCR or resequencing. Alternatively, as discussed for WPT608–1 above, it is possible that the WPT553–6 transgene integration may have disrupted a critical process for gametophyte or early sporophyte survival and was thus not able to be recovered in the progeny. This would not entirely explain the inability to identify the transgene integration site in the T0 plant, but would provide an explanation for the failure to transmit the transgene to progeny.
Regardless of the construct used in each whole-plant transformed line, each junction displayed evidence of microhomology between the reference genome and the transformation vector. While the distribution of integration sites is spread throughout the genome, the evidence of microhomology flanking each of the transgene integration sites further reinforces that this process is not entirely random .
Despite the complications of working with these complex plants, there is a high probability to recover a desired product using the CRISPR/Cas9 technology in soybean. In this study, we used two different Cas9 endonucleases, and they yielded similar mutation profiles between events. While the size of mutations observed were all under 7 bp in size, all but one mutation induced at a gRNA target site created a frame-shift mutation, most likely knocking out the function of the target gene. In the case of multiple transgene insertions, it may be difficult to completely segregate away all the transgenic copies in subsequent generations. However, additional backcrosses or outcrosses can be used to remove these loci, as demonstrated by Curtin et al. . This is a relatively minor inconvenience, given the capacity to generate vast and novel allelic diversity for so many loci.
The results described in this study highlight the range of outcomes one might expect from strategies that rely on stable transformation of a DNA editing construct. Such experiments can be complicated, as they typically require a minimum of two loci of interest, the transgene integration site(s) and the targeted region(s). This quickly becomes more complex when there are multiple unlinked transgene integrations and when there are multiple gene editing targets. Furthermore, unexpected segregation patterns may be driven by several potential mechanisms, such as plant chimeras, editing of intended and paralogous targets, linkage between the transgene integration and target site, and transient expression of the editing reagents without transgene integration into the host genome. Genome resequencing provides high-resolution of transgene structures and editing events, enabling researchers to diagnose both the expected and unexpected segregation outcomes from these lineages.
Generation of whole plant transformant expression vectors
Plant expression vectors were created using three different binary vectors; PMDC123, PMDC32, and pNB96 [2, 65]. The expression vector used to create WPT536 was a modified version of the Cas9 MDC123 found on addgene.org (https://www.addgene.org/59184/). The vector was modified by replacing the 2x35S Cas9 promoter with a Glycine max ubiquitin promotor  and adding the Rin4b (Glyma.16 g090700) gRNA recognition sites. The WPT553 expression vector, MDC32/GUS/GmCas9, was originally developed and used in a previous publication . WPT 608–1 and 608–3 used the same pSC218GG construct used in previous work , except with different gRNA recognitions sites for the Glyma.16G209100 (and Glyma.09G159900) target sites.
Identification of CRISPR/Cas9 target sites
CRISPR target sites were identified using a soybean CRISPR design website (http://stuparcrispr.cfans.umn.edu/CRISPR/) . Glyma numbers from the Wm82.a2.v1 soybean reference were used as input into the webtool, and target-sites were screened for unique restriction sites designed to cut 3–5 bp upstream of the proto-spacer adjacent motif.
Delivery of expression vectors to soybean whole-plants
Constructs were delivered to the Bert-MN-01 background using 18r12, a disarmed k599 Agrobacterium rhizogenes strain . Methods for delivery and growth of whole-plant transformants were performed as previously described .
DNA extraction and identification of transgene insertion sites and mutation sites
Leaf tissue was harvested from five different soybean branches for each whole-plant transformant and extracted with a Qiagen DNeasy plant kit (item 69,106). DNA samples were sent to the University of Minnesota Genomics Center for sequencing using an Illumina HiSeq2500 with v4 chemistry to generate 125 bp parried-end reads. Sequencing was performed to approximately 20X genome coverage for each sample. Reads were checked for initial quality using Fastqc version 0.11.5 and Illumina Truseq adapters were trimmed using cutadapt version 1.8.1 with a minimum read length set to 40 bp and quality cutoff set to a phread score of 20 [66, 67]. To map reads to the soybean reference genome (Wm82.a2.v1), we used bwa version 0.7.12 with band width set to 100, mark shorter splits as secondary, and penalty for mismatch set to 6 . Samtools version 1.6 was used to convert any SAM file format to BAM format, sort, and index files . Identification of transgene insertion sites was performed in a manner similar to Srivastava et al. 2014 . Fasta files were created using the transgene cassette with 100 bp flanking backbone sequence to serve as our reference genome. Sequenced reads were then mapped to the transgene reference using the same programs and parameters used to map reads to the reference genome. To detect transgene insertion junctions, reads that mapped to the transgene on only one of the two paired ends were extracted using a modified version of extract_unmapped_mates.pl from , to accept bam files as input. The other paired ends (those that did not map to the transgene and were termed orphan reads) were then mapped to the Wm82.a2.v1 reference using bowtie2 version 2.2.4 using -- local -- very-sensitive-local  to identify the genomic sequences adjacent to the transgene insertion. SAM files were then converted to BAM file format, sorted and indexed in the same manner mentioned above. Orphaned reads that mapped to the reference were further investigated upon using IGV version 2.3.90 . Orphaned read mapping was then compared to read mapping to the soybean reference and the parental line (Bert-MN-01) as a control. Deletions were investigated using IGV at each CRISPR site throughout the genome. To automate this process, a custom bash script was created called TransGeneMap (https://github.com/MeeshCompBio/Soybean_Scripts) that allows users to input only the forward and reverse reads, index reference genome, and transgene sequence to automate the analysis.
Mutation analyses of CRISPR target sites were performed on T0 plants and progeny using PCR-based gel assays as previously described . Sanger sequencing of PCR amplicons or cloned PCR products was used to identify and confirm specific mutations at these sites.
Clustered Regularly Interspaced Short Palindromic Repeats
Glycine max ubiquitin
Whole genome sequencing
Bibikova M, Golic M, Golic KG, Carroll D. Targeted chromosomal cleavage and mutagenesis in Drosophila using zinc-finger nucleases. Genetics. 2002;161:1169–75.
Curtin SJ, Zhang F, Sander JD, Haun WJ, Starker C, Baltes NJ, et al. Targeted mutagenesis of duplicated genes in soybean with zinc-finger nucleases. Plant Physiol. 2011;156:466–73.
Qi Y, Zhang Y, Zhang F, Baller JA, Cleland SC, Ryu Y, et al. Increasing frequencies of site-specific mutagenesis and gene targeting in Arabidopsis by manipulating DNA repair pathways. Genome Res. 2013;23:547–54.
Bogdanove AJ, Voytas DF. TAL effectors: customizable proteins for DNA targeting. Science. 2011;333:1843–6.
Christian ML, Demorest ZL, Starker CG, Osborn MJ, Nyquist MD, Zhang Y, et al. Targeting G with TAL effectors: a comparison of activities of TALENs constructed with NN and NK repeat variable di-residues. PLoS One. 2012;7:e45383.
Streubel J, Blücher C, Landgraf A, Boch J. TAL effector RVD specificities and efficiencies. Nat Biotechnol. 2012;30:593–5.
Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337:816–21.
Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343:80–4.
Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat Biotechnol. 2014;32:347–55.
Čermák T, Curtin SJ, Gil-Humanes J, Čegan R, Kono TJY, Konečná E, et al. A multipurpose toolkit to enable advanced genome engineering in plants. Plant Cell. 2017;29:1196–217.
Schiml S, Fauser F, Puchta H. The CRISPR/Cas system can be used as nuclease for in planta gene targeting and as paired nickases for directed mutagenesis in Arabidopsis resulting in heritable progeny. Plant J. 2014;80:1139–50.
Belhaj K, Chaparro-Garcia A, Kamoun S, Nekrasov V. Plant genome editing made easy: targeted mutagenesis in model and crop plants using the CRISPR/Cas system. Plant Methods. 2013;9:39.
Shan Q, Wang Y, Li J, Zhang Y, Chen K, Liang Z, et al. Targeted genome modification of crop plants using a CRISPR-Cas system. Nat Biotechnol. 2013;31:686–8.
Fauser F, Schiml S, Puchta H. Both CRISPR/Cas-based nucleases and nickases can be used efficiently for genome engineering in Arabidopsis thaliana. Plant J. 2014;79:348–59.
Feng Z, Zhang B, Ding W, Liu X, Yang DL, Wei P, et al. Efficient genome editing in plants using a CRISPR/Cas system. Cell Res. 2013;23:1229–32.
Mao Y, Zhang H, Xu N, Zhang B, Gou F, Zhu JK. Application of the CRISPR–Cas system for efficient genome engineering in plants. Mol Plant. 2013;6:2008–11.
Soyk S, Müller NA, Park SJ, Schmalenbach I, Jiang K, Hayama R, et al. Variation in the flowering gene SELF PRUNING 5G promotes day-neutrality and early yield in tomato. Nat Genet. 2017;49:162–8.
Liang Z, Chen K, Li T, Zhang Y, Wang Y, Zhao Q, et al. Efficient DNA-free genome editing of bread wheat using CRISPR/Cas9 ribonucleoprotein complexes. Nat Commun. 2017;8:14261.
Langner T, Kamoun S, Belhaj K. CRISPR crops: plant genome editing toward disease resistance. Annu Rev Phytopathol. 2018;56:479–512.
Khatodia S, Bhatotia K, Passricha N, Khurana SM, Tuteja N. The CRISPR/Cas genome-editing tool: application in improvement of crops. Front Plant Sci. 2016;7:506.
Jaganathan D, Ramasamy K, Sellamuthu G, Jayabalan S, Venkataraman G. CRISPR for crop improvement: an update review. Front Plant Sci. 2018;9:985.
Song G, Jia M, Chen K, Kong X, Khattak B, Xie C, et al. CRISPR/Cas9: a powerful tool for crop genome editing. Crop J. 2016;4:75–82.
Ma X, Zhu Q, Chen Y, Liu YG. CRISPR/Cas9 platforms for genome editing in plants: developments and applications. Mol Plant. 2016;9:961–74.
Paz MM, Martinez JC, Kalvig AB, Fonger TM, Wang K. Improved cotyledonary node method using an alternative explant derived from mature seed for efficient Agrobacterium-mediated soybean transformation. Plant Cell Rep. 2006;25:206–13.
Hinchee MAW, Connor-Ward DV, Newell CA, McDonnell RE, Sato SJ, Gasser CS, et al. Production of transgenic soybean plants using Agrobacterium-mediated DNA transfer. Nat Biotechnol. 1988;6:915–22.
Chee PP, Fober KA, Slightom JL. Transformation of soybean (Glycine max) by infecting germinating seeds with Agrobacterium tumefaciens. Plant Physiol. 1989;91:1212–8.
Veena V, Taylor CG. Agrobacterium rhizogenes: recent developments and promising applications. In Vitro Cell Dev Biol Plant. 2007;43:383–403.
Sanford JC. The biolistic process. Trends Biotechnol. 1988;6:299–302.
Jupe F, Rivkin AC, Michael TP, Zander M, Motley ST, Sandoval JP, et al. The complex architecture and epigenomic impact of plant T-DNA insertions. PLoS Genet. 2019;15:e1007819.
Collier R, Dasgupta K, Xing YP, Hernandez BT, Shao M, Rohozinski D, et al. Accurate measurement of transgene copy number in crop plants using droplet digital PCR. Plant J. 2017;90:1014–25.
Makarevitch I, Svitashev SK, Somers DA. Complete sequence analysis of transgene loci from plants transformed via microprojectile bombardment. Plant Mol Biol. 2003;52:421–32.
Svitashev SK, Somers DA. Genomic interspersions determine the size and complexity of transgene loci in transgenic plants produced by microprojectile bombardment. Genome. 2001;44:691–7.
Twyman, R. M., and Christou, P. Plant transformation technology – particle bombardment. In: Christou P, Klee H, editors. Handbook of Plant Biotechnology. Wiley, NY; 2004. p. 263–289.
Jackson SA, Zhang P, Chen WP, Phillips RL, Friebe B, Muthukrishnan S, et al. High-resolution structural analysis of biolistic transgene integration into the genome of wheat. Theor Appl Genet. 2001;103:56–62.
Jacobs TB, LaFayette PR, Schmitz RJ, Parrott WA. Targeted genome modifications in soybean with CRISPR/Cas9. BMC Biotechnol. 2015;15:16.
Michno JM, Wang X, Liu J, Curtin SJ, Kono TJ, Stupar RM. CRISPR/Cas mutagenesis of soybean and Medicago truncatula using a new web-tool and a modified Cas9 enzyme. GM Crops Food. 2015;6:243–52.
Sun X, Hu Z, Chen R, Jiang Q, Song G, Zhang H, et al. Targeted mutagenesis in soybean using the CRISPR-Cas9 system. Sci Rep. 2015;5:10342.
Li Z, Liu ZB, Xing A, Moon BP, Koellhoffer JP, Huang L, et al. Cas9-guide RNA directed genome editing in soybean. Plant Physiol. 2015;169:960–70.
Tang F, Yang S, Liu J, Zhu H. Rj4, a gene controlling nodulation specificity in soybeans, encodes a thaumatin-like protein but not the one previously reported. Plant Physiol. 2016;170:26–32.
Cai Y, Chen L, Liu X, Sun S, Wu C, Jiang B, et al. CRISPR/Cas9-mediated genome editing in soybean hairy roots. PLoS One. 2015;10:e0136064.
Cai Y, Chen L, Liu X, Guo C, Sun S, Wu C, et al. CRISPR/Cas9-mediated targeted mutagenesis of GmFT2a delays flowering time in soya bean. Plant Biotechnol J. 2018;16:176–85.
Curtin SJ, Xiong Y, Michno JM, Campbell BW, Stec AO, Čermák T, et al. CRISPR/Cas9 and TALENs generate heritable mutations for genes involved in small RNA processing of Glycine max and Medicago truncatula. Plant Biotechnol J. 2018;16:1125–37.
Benfey PN, Chua NH. The cauliflower mosaic virus 35S promoter: combinatorial regulation of transcription in plants. Science. 1990;250:959–66.
Hernandez-Garcia CM, Martinelli AP, Bouchard RA, Finer JJ. A soybean (Glycine max) polyubiquitin promoter gives strong constitutive expression in transgenic soybean. Plant Cell Rep. 2009;28:837–49.
Li JF, Norville JE, Aach J, McCormack M, Zhang D, Bush J, et al. Multiplex and homologous recombination–mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9. Nat Biotechnol. 2013;31:688–91.
Flickinger M, Jun G, Abecasis GR, Boehnke M, Kang HM. Correcting for sample contamination in genotype calling of DNA sequence data. Am J Hum Genet. 2015;97:284–90.
Chandrasekaran J, Brumin M, Wolf D, Leibman D, Klap C, Pearlsman M, et al. Development of broad virus resistance in non-transgenic cucumber using CRISPR/Cas9 technology. Mol Plant Pathol. 2016;17:1140–53.
Xing HL, Dong L, Wang ZP, Zhang HY, Han CY, Liu B, et al. A CRISPR/Cas9 toolkit for multiplex genome editing in plants. BMC Plant Biol. 2014;14:327.
Pyott DE, Sheehan E, Molnar A. Engineering of CRISPR/Cas9-mediated potyvirus resistance in transgene-free Arabidopsis plants. Mol Plant Pathol. 2016;17:1276–88.
Feng Z, Mao Y, Xu N, Zhang B, Wei P, Yang DL, et al. Multigeneration analysis reveals the inheritance, specificity, and patterns of CRISPR/Cas-induced gene modifications in Arabidopsis. Proc Natl Acad Sci (USA). 2014;111:4632–7.
Osakabe Y, Watanabe T, Sugano SS, Ueta R, Ishihara R, Shinozaki K, et al. Optimization of CRISPR/Cas9 genome editing to modify abiotic stress responses in plants. Sci Rep. 2016;6:26685.
Butler NM, Atkins PA, Voytas DF, Douches DS. Generation and inheritance of targeted mutations in potato (Solanum tuberosum L.) using the CRISPR/Cas system. PLoS One. 2015;10:e0144591.
Wang Y, Cheng X, Shan Q, Zhang Y, Liu J, Gao C, et al. Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew. Nat Biotechnol. 2014;32:947–51.
Yin X, Biswal AK, Dionora J, Perdigon KM, Balahadia CP, Mazumdar S, et al. CRISPR-Cas9 and CRISPR-Cpf1 mediated targeting of a stomatal developmental gene EPFL9 in rice. Plant Cell Rep. 2017;36:745–57.
Yan W, Chen D, Kaufmann K. Efficient multiplex mutagenesis by RNA-guided Cas9 and its use in the characterization of regulatory elements in the AGAMOUS gene. Plant Methods. 2016;12:23.
Zhang Z, Mao Y, Ha S, Liu W, Botella JR, Zhu JK. A multiplex CRISPR/Cas9 platform for fast and efficient editing of multiple genes in Arabidopsis. Plant Cell Rep. 2016;35:1519–33.
Bortesi L, Fischer R. The CRISPR/Cas9 system for plant genome editing and beyond. Biotechnol Adv. 2015;33:41–52.
Cai CQ, Doyon Y, Ainley WM, Miller JC, DeKelver RC, Moehle EA, et al. Targeted transgene integration in plant cells using designed zinc finger nucleases. Plant Mol Biol. 2009;69:699–709.
Tzfira T, Frankman LR, Vaidya M, Citovsky V. Site-specific integration of Agrobacterium tumefaciens T-DNA via double-stranded intermediates. Plant Physiol. 2003;133:1011–23.
Chilton MD, Que Q. Targeted integration of T-DNA into the tobacco genome at double-stranded breaks: new insights on the mechanism of T-DNA integration. Plant Physiol. 2003;133:956–65.
D’Halluin K, Vanderstraeten C, Van Hulle J, Rosolowska J, Van Den Brande I, Pennewaert A, et al. Targeted molecular trait stacking in cotton through targeted double-strand break induction. Plant Biotechnol J. 2013;11:933–41.
Ainley WM, Sastry-Dent L, Welter ME, Murray MG, Zeitler B, Amora R, et al. Trait stacking via targeted genome editing. Plant Biotechnol J. 2013;11:1126–34.
Zhang Y, Liang Z, Zong Y, Wang Y, Liu J, Chen K, et al. Efficient and transgene-free genome editing in wheat through transient expression of CRISPR/Cas9 DNA or RNA. Nat Commun. 2016;7:12617.
Somers DA, Makarevitch I. Transgene integration in plants: poking or patching holes in promiscuous genomes? Curr Opin Biotechnol. 2004;15:126–31.
Curtis MD, Grossniklaus U. A gateway cloning vector set for high-throughput functional analysis of genes in planta. Plant Physiol. 2003;133:462–9.
Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–2.
Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25:1754–60.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Srivastava A, Philip VM, Greenstein I, Rowe LB, Barter M, Lutz C, et al. Discovery of transgene insertion sites by high throughput sequencing of mate pair libraries. BMC Genomics. 2014;15:367.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–6.
The authors are grateful to Dr. Candice Hirsch for helpful suggestions and comments on this manuscript. The authors are appreciative of the University of Minnesota’s Office of Information Technology for providing data storage and the Minnesota Supercomputing Institute for other computational needs.
This work was supported, in part, by the United States Department of Agriculture (Biotechnology Risk Assessment Project #2015–33522-24096).
Ethics approval and consent to participate
The development of recombinant plants was conducted at the University of Minnesota. The protocol for this work was approved by the University of Minnesota Institutional Biosafety Committee (protocol number 1507-32821H).
Consent for publication
R.M.S. is an inventor on one patent concerning plant gene editing. The remaining authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
WPT construct metadata.
Screening of markers and mutations in the CRISPR transgenic series targeting Rin4b. Fig. S2. IGV screenshot of WGS at the gRNA target site for Rin4b.. Fig. S3. IGV screenshot of the CRISPR/Cas9 transgene (targeting Rin4b) insertion event using WGS. Fig. S4. Read mapping coverage for the transgene encoding the CRISPR/Cas9 targeting Rin4b. Fig. S5. PCR assays for transgene presence and targeted mutations on chromosome 16 and 09 for in WPT608–1 series. Fig. S6. Transgene detection in all offspring for WPT608–3. Fig. S7. IGV screenshot of Glyma.16G209100 gRNA target site and transgene insertion on chromosome 16. Fig. S8. IGV screenshot of the Glyma.16G209100 paralog (Glyma.09G159900) gRNA target site and transgene insertion on chromosome 9. Fig. S9. Analysis of the CRISPR target site on chromosome 16 and 09 for series WPT608–3. Fig. S10. GS1 (Glyma.18 g041100) mutations at the gRNA target site induced by CRISPR/Cas9. Fig. S11. Read mapping coverage for the transgene encoding the CRISPR/Cas9 targeting GS1.
About this article
Cite this article
Michno, JM., Virdi, K., Stec, A.O. et al. Integration, abundance, and transmission of mutations and transgenes in a series of CRISPR/Cas9 soybean lines. BMC Biotechnol 20, 10 (2020). https://doi.org/10.1186/s12896-020-00604-3