Universal MCS and its associated primer design
The design of the UMCS for universal subcloning is shown in Fig. 2a. The main features of this novel MCS design are as follows: (1) The length of UMCS is only 48 bp, which is shorter than many of the existing MCSs. The actual sequence to be synthesized is 42 bp for both the forward and reverse directions (Fig. 2b). By annealing and ligating to a predigested vector molecule, the plasmid with the UMCS was reconstituted. (2) The restriction enzyme recognition sites at both termini are needed only for UMCS integration. They can be replaced freely according to the original MCS compatibility of the vector. (3) The restriction enzyme recognition site in the middle of the UMCS is used only for vector linearization, which can be substituted with any other sequences for digestion if they are compatible with the vector. (4) The two homologous linker regions were carefully optimized to have close Tm and the same GC content, which are supposed to facilitate PCR. Hairpin and dimerization of these regions were also minimized through optimization studies. (5) Primers used for the amplification of the insert can be easily designed by introducing a homologous linker at their 5′-terminus, while the Tm of the template binding region is calculated to be approximately 55 °C (Fig. 2c). (6) Theoretically, because of MCS consistency, any vector with a UMCS sequence can be amplified by the “UMCS-PCR-F + UMCS-PCR-R” primer pair, and any insert cloned into the UMCS can be amplified by the “UMCS-Seq-F + UMCS-Seq-R” primer pair. (7) Both linker regions do not contain rare codons [23, 24]. If the linker sequence is to be expressed, the codons contained in the UMCS are unlikely to affect the target protein expression. This is crucial for vectors that express N-terminal tags.
Construction of pUC19-UMCS-EGFP
To demonstrate proof of principle, we first constructed two types of pUC19-UMCS-EGFP vectors (Fig. 3) as tools for studying the transformation efficiency. These vectors are modified versions of the pUC19 because the lacZ sequence was first replaced by UMCS-XhoI sequences with different linearization sites (SalI or EcoRV, as shown in Fig. 3a and 3b), followed by the insertion of EGFP (Enhanced Green Fluorescent Protein) through XhoI-based digestion and ligation. Among them, SalI represented the linearized product with sticky ends, while EcoRV represented blunt ends. Together with vectors linearized by PCR, the effect of different linearization methods on transformation can be clarified. During in vivo assembly, if the mCherry CDS is successfully cloned into the UMCS, the colony will appear orange. Otherwise, it will be green or colorless (through nonspecific assembly). By calculating orange colonies against the total number of colonies, the positive ratio of transformation can be determined.
Effects of the linearization method and homologous linker length
Next, we evaluated the effects of different linearization methods and the length of the homologous linker in terms of the number of transformants and the corresponding positive ratio. As shown in Fig. 4a, when PCR was used as a linearization method, the transformation efficiency showed a positive correlation with the length of the linker sequence. When a 6-bp linker was used, less than 100 CFU/plate and a positive ratio below 10% were confirmed. In contrast, the efficiency of transformation increased to 259 ± 34, 742 ± 132, 1306 ± 123, and 1525 ± 165 CFU/plate, and the positive ratio increased dramatically to 78 ± 4.2%, 96 ± 1.5%, 95 ± 1.7%, and 97 ± 1.9% when 9-, 12-, 15-, and 18-bp linkers were tested, respectively. It is evident that a 6-bp linker is not enough for bacterial in vivo assembly, and a length longer than 9 bp is necessary when performing such experiments. Moreover, there were no significant differences between the 15- and 18-bp linker. Considering both the colony number and positive ratio, we concluded that the length of the homologous linker for the PCR method was 12 ~ 15 bp. On the other hand, when linearization was carried out by restriction digestion, it was shown (Fig. 4b and 4c) that the digested vectors with blunt ends (EcoRV) yielded much more positive clones than sticky ends at any of the linker lengths we tested. According to the data collected, when a restriction enzyme was used for linearization, a 12 ~ 15-bp linker also ensured enough transformants and positive clones to screen. In these cases, further increase the linker length is not necessary.
Effects of insert/vector ratio on transformation
The molar ratio of insert/vector is considered a critical factor for high-efficiency transformation [25]. Thus, we further investigated the influence of PCR, SalI and EcoRV digestion on transformation efficiency at multiple insert/vector levels. The yields of transformants and the positive ratio at the linker length of 15 bp were determined (Fig. 5 and Supplementary Figure S1) with molar ratios of 1, 5, 10, and 15. According to the data shown, regardless of the linearization method used, the optimal molar ratio remained at 5. A further increase in insert usage will not improve the results, while a decreased insert quantity shows a negative effect on transformation efficiency. Therefore, we used a molar ratio of 5 in all subsequent studies, which is consistent with the finding of Kostylev et al. [26]
Effects of the digestive linearization method on the fidelity of assembly
We were interested in the assembly fidelity of the digestive linearization method mainly based on three reasons: (1) Although high fidelity enzymes such as Q5, Phusion, or KOD have extremely low mismatch rates [27, 28] during the reaction, PCR products could still suffer from random errors introduced by DNA polymerase. Moreover, the possibility of introducing random mutations during the PCR process increases with PCR cycles and the length of vectors, and it is laborious to verify by sequencing. (2) Some vectors are difficult to amplify by PCR. For example, the GC content of the vector is too high or too low, or the vector contains too many repetitive sequences, which will affect PCR and even render the reaction impossible to continue [29,30,31]. (3) As previously mentioned, if vectors cannot be linearized by PCR or fidelity is vital for the experiment, the methods shown in Fig. 6 must be applied. Under such circumstances, only the sequences flanking the linearization site are used as homologous linkers to ensure the versatility of UMCS. However, when the flanking sequences (15 bp) were used as homologous linkers, the post-digestion residual bases then served as nonhomologous sequences. This means the residual bases might displace part of the insert and cause mutations at its 5′-terminus, 3′-terminus (confirmed during our pilot study), or both. The frequency of such events is a major concern for UMCS applications, since the UMCS is only universal when this frequency is low enough.
Therefore, SalI (Fig. 6a-d) and EcoRV (Fig. 6e-h) digestion were taken as sticky- and blunt-end examples to evaluate sequence replacement between insert and residual bases after assembly. As shown in Fig. 6, when the mCherry CDS was used as an insert, five possible products could be expected: (1) if the assembly fails, the colony will appear green (only EGFP expression); (2) if the fragments assembled successfully and the mCherry CDS remains intact, the colony will appear orange (only mCherry expression); (3) if assembled successfully, but the 5′-terminus of the mCherry CDS is partially replaced, the colony will appear orange (only mCherry expression); (4) if assembled successfully, but the 3′-terminus of the mCherry CDS is partially replaced, the colony appears yellow for the loss of the mCherry stop codon (mCherry-EGFP fusion protein is expressed); and (5) if assembled successfully, while both the 5′- and 3′-terminus of the mCherry CDS are partially replaced, the colony also appears yellow (mCherry-EGFP fusion protein is expressed). For subcloning, we assume that the 5′- and 3′-terminus of the insert will be replaced at the same frequency, then the frequency can be derived by calculating the number of yellow colonies against the orange ones. Thus, Fig. 6d strongly suggests that the frequency of mutation at the 3′-terminus of the insert is low when SalI is used for vector linearization. In fact, less than 0.75% of positive clones contained sequence replacement at the 3′-terminus or at both the 5′- and 3′-terminus. This indicates that the possibility of having at least one mutated terminus is less than 1.5% when the length of the linker is 12 or 15 bp. Therefore, it will be very unlikely for anyone to pick up a mutated product by chance. Moreover, when EcoRV was used for linearization, the frequency of replacement events seemed less than that of SalI by having a statistical value of less than 0.3% on both 12- and 15-bp linkers. In conclusion, after digestion, the residual bases slightly affected the fidelity of assembly (an assumed mechanism of bacterial in vivo assembly with nonhomologous regions is given in Supplementary Fig. S2), but most of the positive colonies were still expected to contain the correct plasmid.
Construction of larger plasmid based on UMCS
To further demonstrate the versatility of the UMCS, we constructed an additional four vectors containing UMCS: (1) pET24a(+)-UMCS, the UMCS was inserted between the original BamHI and XhoI restriction sites, resulting in a plasmid size of 5312 bp; (2) pACT-UMCS and pBind-UMCS, the UMCS was inserted between the original BamHI and XbaI restriction sites, the last “C” of the BamHI recognition sequence was intentionally removed to ensure the correct expression of the insert, resulting in the plasmid size of 5578 and 6372 bp; (3) pCold TF DNA-UMCS, the UMCS was inserted between the original NdeI and XbaI restriction sites, resulting in a plasmid size of 5757 bp. Together with the pUC19-UMCS-S-EGFP that was constructed previously, we examined the transformation efficiency of cloning the mCherry (711 bp), RXRα (Retinoid X Receptor Alpha, 1389 bp), and p85α (Phosphoinositide 3-kinase p85α, 2175 bp) CDS into these vectors. During these experiments, each CDS plus a 15-bp homologous linker region (as shown in Fig. 3a) at both termini was amplified as insert, each transformation was repeated four times, and colony PCR was performed on a panel of 24 colonies randomly selected from each transformation, followed by agarose gel electrophoresis analysis to identify the positive colonies and the positive ratio accordingly. The corresponding results are shown in Fig. 7. In summary, (1) regardless of the linearization method we tried, the yield of transformants decreased as the size of the final plasmid increased; (2) if PCR was used for vector linearization, no apparent correlations between the positive ratio and the size of the plasmid were observed, and the positive ratios were maintained at a high level (> 85%) in all experiments; and (3) when SalI digestion was used, the positive ratio decreased with the increasing size of the final product. However, in our study, this ratio was always maintained at more than 1/3, which is enough for screening positive clones by PCR.
Multi-fragment assembly
Similar to the single-fragment experiments, we used mCherry, RXRα, and RLuc (Renilla Luciferase) for multi-fragment assembly test. As presented in Fig. 8a, the 5′-terminus of the first fragment and the 3′-terminus of the last fragment must have homologous sequences corresponding to the UMCS. The linkers of other fragments needing to be assembled can be designed according to previous studies [6]. Since multi-fragment assembly significantly reduces the yield of transformants and positive clones (Fig. 8b), PCR as a linearization method or dephosphorylation after enzyme digestion is recommended. Meanwhile, further increasing the amount of DNA and the number of component cells might be necessary for the assembly of more than 3 insert fragments.