- Methodology article
- Open Access
Development of a gene synthesis platform for the efficient large scale production of small genes encoding animal toxins
BMC Biotechnologyvolume 16, Article number: 86 (2016)
Gene synthesis is becoming an important tool in many fields of recombinant DNA technology, including recombinant protein production. De novo gene synthesis is quickly replacing the classical cloning and mutagenesis procedures and allows generating nucleic acids for which no template is available. In addition, when coupled with efficient gene design algorithms that optimize codon usage, it leads to high levels of recombinant protein expression.
Here, we describe the development of an optimized gene synthesis platform that was applied to the large scale production of small genes encoding venom peptides. This improved gene synthesis method uses a PCR-based protocol to assemble synthetic DNA from pools of overlapping oligonucleotides and was developed to synthesise multiples genes simultaneously. This technology incorporates an accurate, automated and cost effective ligation independent cloning step to directly integrate the synthetic genes into an effective Escherichia coli expression vector. The robustness of this technology to generate large libraries of dozens to thousands of synthetic nucleic acids was demonstrated through the parallel and simultaneous synthesis of 96 genes encoding animal toxins.
An automated platform was developed for the large-scale synthesis of small genes encoding eukaryotic toxins. Large scale recombinant expression of synthetic genes encoding eukaryotic toxins will allow exploring the extraordinary potency and pharmacological diversity of animal venoms, an increasingly valuable but unexplored source of lead molecules for drug discovery.
Synthetic biology, an interdisciplinary branch of biology, is quickly becoming one of the most attractive areas of research thanks to the recent developments in gene synthesis technology. In combination with intelligent gene design, gene synthesis is emerging as a valuable tool to support recombinant protein expression. De novo gene design allows optimizing codon usage to the recombinant host system thus promoting the effective operation of the cellular translational machinery. In addition, in cases where the nucleic acid template is not available, gene synthesis allows creating DNA molecules de novo. The exponential growth of genomic and metagenomic databases and the current limitations in using this highly useful sequence information due to the lack of tangible DNA are promoting the rapid development of novel gene synthesis technologies.
In recent years, a variety of gene synthesis methodologies have been developed based on the assembling of oligonucleotides into complete genes. Early approaches advanced to synthesize nucleic acids used the enzymatic ligation of pre-formed duplexes of phosphorylated overlapping oligonucleotides . Subsequently, self-priming PCR , PCR assembly , Polymerase chain assembly (PCA)  and template-directed ligation  were described as efficient concepts for de novo gene synthesis of nucleic acids. Recently, methods based on a two-step approach were reported for the production of long DNA sequences. Examples of these technologies are the PCR-based thermodynamically balanced inside-out technology (TBIO) , the two-step total gene synthesis method  that combines both dual asymmetrical PCR (DA-PCR) and overlap-extension (OE-PCR), the PCR-based two-step DNA synthesis (PTDS)  and PCR-based accurate synthesis (PAS) .
Lately, improvements in PCR-based gene synthesis methods, as exemplified by the development of the improved PCR synthesis (IPS) and the simplified gene synthesis (SGS) protocols [8, 9], have been described and incorporate significant simplifications over earlier strategies. SGS uses oligonucleotides of 40 nucleotides (nt) in length and 18–20 nt of overlap region, which are assembled in a unique PCR-assembly reaction leading to the direct construction of the full-length DNA molecule. The simplicity of this protocol combined with its relative low cost, since there no requirement for phosphorylation or purification of the oligonucleotides exists, are a solid base for the development of even more effective PCR-based methods. However, major drawbacks persist and effective improvements need to be implemented in current synthetic protocols to allow their translation to a large scale. One of the major bottlenecks of current gene synthesis protocols consists on the quality of the oligonucleotides used for nucleic acid assembly. It is known that all current gene synthesis methods accumulate errors in the final synthetic molecules. Sequence errors usually derive from the incorporation of imperfect synthetic oligonucleotides or result from low fidelity rates associated with the enzymatic assembling step. Current oligonucleotide synthesis methods produce sequences that are often prematurely terminated, or comprise internal mutations (error rates range from 1 to 10 mutation per kilobase (kb)) . In addition, chemical synthesis of DNA molecules usually not only involve moderate to high error rates but also high costs. Moreover, the chemical synthesis of a desired gene also depends on the accuracy of the DNA polymerase used to assemble the oligonucleotides in a final DNA sequence. Therefore, DNA errors are inevitable and it is often necessary to remove the incorrect synthetic DNA molecules using enzymatic methods [11, 12]. Improvements in oligonucleotide quality, error correction and DNA polymerase efficacy are thus urgently required.
Conventionally, PCR-based gene synthesis is employed to produce a single gene at a time. Thus, development of automated platforms that effectively generate large libraries of nucleic acids is urgently needed. The different steps leading to a single PCR-assembly strategy need to remain simple, accurate and robust when extended to the assembly of multiple genes simultaneously. To develop large scale methods, many factors that affect the efficiency of gene assembly, such as DNA polymerases performance or oligonucleotide concentration and quality require optimization. This work describes different approaches carried out to optimize current gene synthesis protocols. The data was integrated to develop a novel platform which was applied to efficiently synthesize and clone a large number of nucleic acids encoding venom peptides. This automated platform can be translated to the rapid generation of complex gene libraries encoding different families of biotechnologically relevant and valuable proteins and peptides.
The original purpose of this research was to optimize protocols for the synthesis of small genes encoding eukaryotic peptides for expression in Escherichia coli. Three genes of different lengths (A: 290 bp, B: 260 bp, and C: 329 bp) were selected to develop these studies. Gene A encodes the alpha-elapitoxin-Nk2a toxic protein isolated from the Naja kaouthia venom, and genes B and C encode two different venom peptides of unknown function. Venom genes were designed by back-translating the peptide sequence and optimizing codon usage for high levels of expression in E. coli. Guanine-cytosine (GC) content was set to vary between 40 and 60%. Gene design maximized stable mRNA molecules, minimized the presence of repeated sequences and avoided the appearance of E. coli regulatory sequences such as promoters, activators or operators. In addition, Codon Adaptation Index (CAI) value was set to be higher than 0.8. DNA sequences of the three genes are presented in Additional file 1: Table S1.
Oligonucleotides and purification
Oligonucleotides were synthesized by three different suppliers (A, B and C) using the smaller scale available with standard desalting. Reverse-phase cartridge and reverse-phase HPLC purifications were also tested for oligonucleotides obtained from supplier C. Each oligonucleotide was reconstituted in 10 mM Tris-HCl (pH 8.5) to a 100 μM final concentration and kept at −20 °C. Oligonucleotides were used individually or in mixtures at concentrations described below.
The DNA sequence of each gene was used as a template to design the assembly oligonucleotides by dividing the entire sequence into overlapping primers with defined lengths. The external oligonucleotides, termed outer primers, correspond to the external forward and reverse primers. For strategy A (see below), the outer primers include a complementary sequence of 15 bp to promote plasmid re-circulation through homologous recombination (Fig. 1a). In strategy B (see below) the outer primers contained an additional sequence at the 5’-terminus of both forward and reverse primers for cloning into the cloning vector through a Ligation-Independent Cloning protocol (LIC) (Fig. 1b). Internal oligonucleotides (termed inner primers) are used in higher numbers than outer primers. Depending on the gene assembly strategy used, the oligonucleotides were designed with gaps between adjacent primers and comprising 15–20 bp overlap regions. The sequence of all oligonucleotides used in these studies are displayed in Additional file 2: Table S2.
Novel strategies to synthesize small genes
In order to develop an efficient, cost-effective and low error rate strategy to produce synthetic genes with reduced size, two different strategies to construct optimized DNA sequences were initially explored: (A) Polymerase chain assembly using DNA template (PCA-DT) and (B) Polymerase chain assembly DNA template-free (PCA-DTF) (Fig. 1a and b, respectively). PCA-DT was developed to decrease the time involved in traditional gene synthesis methods since this method combines both synthesis and cloning in a single reaction. In step 1, two long (A1) or a pool of small oligonucleotides containing the gene sequence (A2 and A3) were mixed with the cloning vector and then assembly proceeds using a DNA polymerase in a typical cyclic temperature reaction (Fig. 1a) (Table 1). In step 2, the product of the PCR amplification combining both the newly synthesized gene and the vector were used to transform E. coli cells. PCA-DTF strategy is based on previously reported methods used to produce synthetic genes in a single PCR reaction [4, 9, 11]. All oligonucleotides (inner and outer) were pooled together and assembled in a single polymerase chain reaction. The outer primers were used in a higher concentration than inner primers to ensure the construction of full-length sequence of synthetic gene. Different approaches for oligonucleotide design were tested (Fig. 1, B1, B2 and B3). The PCA-DTF method requires a subsequent ligation-free cloning step to insert the synthetic gene into the cloning vector.
Three different strategies based on PCA-DT method were used to synthesize gene A (290 nt). Two, fourteen and twelve oligonucleotides were designed to synthesize full-length gene A following strategies A1, A2 and A3, respectively. For A1 strategy, the gene sequence was dissected in two long oligonucleotides of 135 nt, including a 20 nt overlap with the cloning vector. To produce the synthetic gene using A2 strategy, fourteen oligonucleotides of 20–40 nt with a 20 bp overlap region between forward and reverse primers and no gaps between adjacent oligonucleotides were designed. The length of twelve oligonucleotides used in A3 was 35–40 nt and the overlapping region between successive oligonucleotides was 20 bases. All outer primers contained an additional 15-bp homologous sequence to facilitate the homologous recombination reaction associated with E. coli transformation. Plasmid pNZY28 was used as a cloning vector and was linearized with EcoR V restriction enzyme during 2 h at 37 °C in a heating block. A typical digestion was performed in 100 μL containing 2 μg of plasmid DNA and 50 units of EcoR V restriction enzyme. Linear plasmid DNA was purified using silica-based columns, eluted in 50 μL elution buffer and diluted to a final concentration of 20 ng/ μL. The synthesis of gene A using the strategy A1 was initiated with the addition of the two outer primers at a final concentration of 200 nM to 20 ng of digested pNZY28 vector. For strategies A2 and A3, outer and inner primers were used at a final 800 nM and 30 nM concentration, respectively. The PCR reaction was carried out with 200 μM dNTPs and 2.5 units of Pfu Turbo DNA polymerase (Agilent Technologies). The PCR conditions were 30 cycles at 95 °C for 50 s, 50 °C for 50 s and 72 °C for 3 min. The final cycle was followed by an additional 10 min at 72 °C to ensure complete extension of the 3,099 bp gene product (pUC18: 2,886 bp + gene A: 213 bp). PCR amplification products were analysed by agarose gel electrophoresis.
To synthesize gene A based on strategies B1 and B2 of PCA-DTF, fourteen and twelve oligonucleotides with 40 nt in length and 15 or 20 nt end-overlaps between consecutive oligonucleotides, respectively, were designed (Fig. 1, B1 and B2). Strategy B3 used larger oligonucleotides with 60 nt including 20 nt overlaps (Fig. 1, B3). Outer primers include an additional ligation independent sequence (27 nt on the forward primer and 32 nt on the reverse primer), in order to allow ligation-independent cloning and a 18 bp encoding the Tobacco Etch Virus (TEV) protease. Oligonucleotide assembly was performed as described above using Pfu Turbo DNA polymerase. Assembly reaction was subjected to one cycle of initial denaturation at 95 °C for 5 min, followed by 26 cycles of denaturation at 95 °C for 30 s; annealing at 55 °C for 30 s; and extension at 72 °C for 30 s. PCR amplification products were column purified, as described above, and cloned into pDONR201 vector using Gateway technology (see below).
Optimization of PCR conditions for successful gene synthesis protocol
Efficacy and accuracy of DNA polymerases and the quality and concentration of primers are two critical parameters known to influence gene synthesis. In addition, annealing temperatures and times used for denaturation, annealing and extension during PCR may affect nucleic acid yields. To optimize these parameters, two genes (B and C) were synthesised using gene synthesis strategy B3. Six and eight oligonucleotides with 58–60 nt and a 20 bp gap between primers were used to synthesise genes B and C, respectively. The sequences of overlapping oligonucleotides used to produce genes B and C are presented in Additional file 3: Table S3. The effect of each PCR parameter was singly tested and the remaining components of PCR reaction were fixed in the standard conditions described above. Four DNA polymerases were selected for these studies: KOD Hot Start DNA polymerase (EMD-Millipore), Q5® Hot Star High Fidelity DNA polymerase (New England Biolabs), Pfu Turbo DNA polymerase (Agilent Technologies) and Taq DNA polymerase (Sigma-Aldrich). PCR was developed in a 26-cycle reaction. Denaturation was performed at 95 °C for 30 s for Pfu Turbo and Taq, 95 °C for 16 s for KOD Hot Start DNA polymerase and 98 °C for 10 s for Q5® Hot Star High Fidelity DNA polymerase. Annealing occurred at 60 °C for 10 s for KOD Hot Start DNA polymerase and 60 °C for 30 s for Q5® Hot Star High Fidelity DNA polymerase, Pfu Turbo and Taq. Finally, extension was performed at 70 °C for 3 s for KOD Hot Start DNA polymerase, 72 °C for 30 s for Pfu Turbo and Taq, and 72 °C for 15 s for Q5® Hot Star High Fidelity DNA polymerase. Different overlapping oligonucleotide concentrations were tested. Gene assembly was performed using inner oligonucleotides at a final concentration of 10, 20 and 30 nM. Outer primers were used at final concentrations of 200, 600, 800 and 1000 nM. In addition, the final concentration of dNTPs was set to vary between 0.1 and 0.5 mM. Finally, different PCR profiles were tested. Thus, PCRs were performed at five different annealing temperatures (from 50 °C to 62 °C). Furthermore, a total of 24 PCR programs, which included different times of denaturation, annealing and extension in a total of 22, 24 and 26 cycles, were tested. Final configuration of each PCR program used in these studies is presented in Additional file 4: Table S4.
Cloning and sequencing
After PCR assembly, resulting nucleic acids were purified, inserted into a suitable vector and the integrity of each gene was confirmed by DNA sanger sequencing. Synthesized genes produced by strategy A were ligated into pNZY28 cloning plasmid using the homologous recombination machinery present in E. coli cells. Gene products from strategies B1, B2 and B3 which contained attb1 and attb2 sequences at its 5’ and 3’- ends, respectively, were cloned into pDONR201 vector (ThermoFisher Scientific) using the Gateway cloning system (ThermoFisher Scientific). Cloning reaction mixtures were used to transform E. coli DH5α competent cells. For each transformation, one bacterial colony was inoculated and grown in liquid LB medium supplemented with 100 μg/mL of ampicillin. Plasmids were purified and recombinant integrity of inserted nucleic acids confirmed by sequencing.
Construction of a novel gene synthesis platform for the large scale production of small synthetic genes
An integrated gene synthesis platform was developed for the efficient production of small synthetic genes. This platform combines automation, simplicity and robustness, while decreasing the error rate associated with conventional gene synthesis methods. Initial experiments described above defined the most appropriate PCR assembly protocol. Subsequent experiments evaluated the efficacy of the protocol when applied for the simultaneously synthesis of 96 genes encoding venom peptides. Ninety six genes were designed by back-translating corresponding peptide sequences and by optimizing codon usage for high levels of expression in E. coli. Codons were selected randomly using a Monte Carlo approach according to E. coli codon usage of highly expressed genes. Genes were designed to have a GC content between 40 and 60% and a codon adaptation index (CAI) value higher than 0.8. The sequences of the 96 optimized genes are presented in Additional file 5: Table S5. The pool of primers required to synthesize 96 synthetic genes encoding venom peptides were designed to have 50–60 nt in length, an overlap region of 20 nt between forward and reverse primers with a gap sequence of 20 nt, and included an additional 16-bp conserved sequences at 5’- terminus of both forward and reverse outer primers to allow ligation-independent cloning. The 96 genes were produced from 96 mixes of six oligonucleotides with 50–60 nt in length and 20 nt overlaps. Oligonucleotides were synthesised by Integrated DNA Technologies at the smallest scale (primer solutions at 5 μM) with desalting purification. The two outer primers were used at a final concentration of 800 nM while the inner primers were pooled together in an equimolar mixture to achieve the final concentration of 20 nM. Primer dilutions and PCR assembly was carried out in a 96-well plate format using a Tecan workstation (Switzerland). KOD Hot Start DNA polymerase (EMD Millipore) was used for PCR assembly using optimized conditions to minimize primer-dimer formation and nonspecific amplifications. PCR reactions were performed in a 50 μL total volume and consisted of 0.2 mM dNTPs, 1.5 mM MgCl2, 1× reaction buffer and 1 unit of KOD Hot Start DNA polymerase. PCR assembly reactions were carried out in a 96-well PCR plate format. The cycling parameters were as follows: 1 cycle of 95 °C for 2 min; 26 cycles of 95 °C for 20 s, 60 °C for 8 s, and 70 °C for 3 s. After PCR assembly, assembled PCR products were visualized by agarose gel electrophoresis and purified through silica-base chromatography in a Tecan liquid handler (Switzerland). Purified PCR products were cloned into pHTP1 expression vector (NZYTech, Ltd) using the NZYEasy cloning kit (NZYTech, Ltd) that follows a LIC technology. Gene assembly products were mixed with 120 ng of linearized vector, using a molar ratio of 1:5 (vector:insert). Cloning reactions were performed in 10 μL volume in a 96-well PCR plate and preceded for 1 h at 37 °C on a heating block. The mixtures were then incubated at 80 °C for 10 min followed by 10 min at 30 °C. Recombinant plasmids were transformed using a high-throughput method into E. coli DH5α competent cells and spread on LB agar plates supplemented with 50 μg/mL of kanamycin. After overnight incubation at 37 °C, only one colony per transformation was picked and grown in 5 mL of LB kanamycin medium in 24-deep-well plates sealed with gas-permeable adhesive seals. Cultures were incubated at 37 °C for ~16 h and cells were then harvested at 1,500 × g for 15 min. Plasmids were purified from bacterial pellets in a Tecan workstation (Switzerland), and subsequently the DNA sequence of each gene was verified by Sanger sequencing. In case the DNA sequence did not correspond with the designed gene, a second and eventually third colony was picked for sequencing analysis.
Results and discussion
Synthesis and assembly of the 213 nt gene encoding alpha-elapitoxin-Nk2a toxin using PCA-DT and PCA-DTF methods
The gene encoding the toxic peptide alpha-elapitoxin-Nk2a was designed to maximize expression in E. coli. Initial experiments aimed at identifying the most appropriate strategy for the synthesis of small genes and attempted to reduce the number of steps involved in traditional gene synthesis approaches. Thus, the efficiency of PCA-DT and PCA-DTF PCR-based methods was tested for the synthesis of gene A, which has a size of 290 nt (Fig. 1). To ensure simplicity and speed in the synthetic gene process, PCA-DT does not involve an additional cloning step. Using method A1 (Fig. 1), gene A was synthesised using pNZY28 vector as DNA template and employing two long oligonucleotides (135 nt) to amplify the full-length plasmid sequence containing half of the gene sequence in each 5’ and 3’ ends. Long oligonucleotides are more prone to incorporate errors. Thus, in alternative, gene A was also amplified using a set of 14 and 12 overlapping oligonucleotides (Fig. 1, strategies A2 and A3, respectively), which contain a 20 nt sequence that hybridises with the cloning plasmid. In contrast, PCA-DTF methods use a template-less approach to assemble the artificial gene. The six methods described in Fig. 1 were employed to synthesize gene A. The data, presented in Fig. 2, revealed that all strategies effectively generated the toxic gene. However, yield of the assembled target nucleic acid was higher using strategies B when compared with the quantity of PCR product obtained using the PCA-DT. Although strategy A does not require an additional step to insert the synthetic gene into the cloning plasmid, globally the process is more tedious due to the long periods required for the amplification of large nucleic acids, as PCR involves both the toxic gene and the vector (~3 Kb, pNZY28 vector plus target gene). In addition, cloning efficiency, evaluated through colony-PCR, revealed an approximately 95% cloning efficacy of the Gateway system versus 80% when using the self-ligation process of strategy A. These results suggest that strategies including a DNA template and involving plasmid re-circularization are probably not the best option for the synthesis of small genes as they are more tedious while leading to lower cloning efficiencies. These two issues are particularly important when protocols require automation.
In order to verify if the length of the oligonucleotides influences the appearance of errors in synthetic genes, we selected 24 clones synthesized by strategies B1, B2 and B3 for DNA sequence verification using Sanger sequencing. Analysis of the 72 sequences revealed that approximately 80% of the clones for each one of the three strategies presented the correct DNA sequence (Fig. 2c). Thus, increasing primer size from 40 to 60 nt has no impact in the number of errors in the resulting synthetic DNA. This primer size was also used by Stemmer and colleagues to successfully synthesise two long DNA fragments , although the error rate associated with the method was not reported. Therefore, taken together, the data suggest that the PCA-DTF gene assembly method that uses a set of 60 nt overlapping oligonucleotides with 20 nt gaps, is the most convenient strategy for the synthesis of small genes as it provides high gene synthesis yields and efficacy with increased cloning efficiencies.
Performance of various thermostable DNA polymerases for gene synthesis
Taq, Kod and Pfu polymerases have been commonly used for the production of synthetic genes following PCR-based methods [4, 7, 13]. However, Taq polymerase is known to be error-prone . The use of Kod and Pfu polymerases allow higher accuracy in gene synthesis although the elongation rate of Pfu polymerases is lower . Here, the efficacy of four different DNA polymerases (KOD Hot Start DNA polymerase, Q5® Hot Start HF, Pfu Turbo DNA polymerase, and Taq DNA polymerase) for the production of gene B (260 bp), using B3 strategy PCA-DTF (described above) was analysed. The data, presented in Fig. 3, revealed that the four polymerases effectively assembled the 260 bp gene. However, KOD Hot Star DNA polymerase seems to express a higher performance when compared with the other enzymes. These data suggest that efficacy of Kod polymerases is higher than Taq and Pfu enzymes for assembling small genes. After cleaning the PCR products, genes were cloned into pDONR201 vector and 24 clones assembled by each one of the four DNA polymerase were sequenced. The data, presented in Table 2, revealed that appearance of mutations is more frequent for Taq, followed by Q5® Hot Start HF and Pfu Turbo DNA polymerase. In contrast, only five out of the 22 recombinant plasmids containing the synthetic gene B assembled by KOD Hot Star DNA polymerase presented errors, which reflects one mutation per 1.15 kb. Deletions and substitutions were the most frequent errors identified in the 96 variants of gene B sequenced. As expected, the data suggested that KOD Hot Start DNA polymerase is more accurate than the other three DNA polymerases due to a higher fidelity. These results are in line with previous studies [8, 9, 16], which have revealed that usage of high fidelity DNA polymerases decreases the number of errors introduced in synthetic genes during PCR amplification. Moreover, the PCA-DTF procedure appears to be very efficient with KOD Hot Start DNA polymerase due to the rapid elongation rates presented by this DNA polymerase; completion of the gene synthesis protocol is achieved in less than 40 min.
Oligonucleotide concentration influences the efficacy of gene synthesis
Since PCA-DTF is suggested to be the best method to produce small synthetic genes and KOD Hot Star DNA polymerase is the most effective enzyme to apply in these protocols, we analysed the influence of oligonucleotide concentration on gene assembly efficiency. Thus, three different concentrations of inner oligonucleotides were combined with different concentrations of outer primers in a PCR-assembly reaction set to synthesize gene B. Initially, concentrations of inner oligonucleotides were of 10 nM and 30 nM, and outer oligonucleotides were tested at 200, 600 and 1000 nM. After the assembly of gene B the resulting nucleic acids were analysed by agarose gel electrophoresis. The data, presented in Fig. 4a, suggest that gene synthesis is most effective with 30 nM of inner primers. In addition, the best concentrations of outer primers were of 600 and 1000 nM. In order to define more precisely the best concentrations of primers, outer primer concentration was fixed at 800 nM and concentrations of inner primers varied from 10 to 30 nM. Data suggest that at 800 nM of outer primers, the optimal concentration of inner oligonucleotides is 20 nM (Fig. 4b). The assembly reaction was also performed under different concentrations of dNTPs. Interestingly, the results suggest that concentrations of dNTPs below 0.2 mM are not appropriate for gene synthesis. Thus, for a robust and successful PCA-DTF procedure a concentration of 0.3–0.4 mM of dNTPs seems to be the most effective (Fig. 4).
Effect of cycling temperatures on the efficiency of gene synthesis
In the previous assembly reactions following PCA-DTF, annealing temperature was set to 60 °C. Several studies  have suggested that factors such as melting temperature (Tm) and GC content affect optimal assembly. Thus, the efficiency of synthesis of gene B was tested using a gradient of annealing temperatures (50 °C, 52.3 °C, 54.6 °C, 59.6 °C and 62 °C) applying the optimal PCR conditions described in the previous section. The data, presented in Fig. 5, suggest that oligonucleotide assembly occurs at temperatures ranging from 50 to 62 °C, although yields of nucleic acid seems to increase at higher annealing temperatures. In addition, the effect of the number of PCR cycles in the efficiency of gene synthesis was tested by synthesizing gene C using 22, 24 and 26 thermal cycles. The data revealed that, as expected, the quantity of the amplified product increases with the number of thermal cycles employed (Fig. 5). Likewise, when denaturation last 20 s, an extension of 3 s produces higher DNA yields than a 1 s extension. In addition, annealing of overlapping oligonucleotides during 8 s is more favourable than for 10 s. Therefore, the data revealed that 26 thermal cycles of denaturation (20 s), annealing (8 s) and extension (3 s) step are optimal for PCR assembly following the PCA-DTF method.
Effect of oligonucleotide source in the efficacy of gene synthesis
It is well known that efficacy of gene synthesis directly depends on the quality of the synthetic oligonucleotides used for DNA assembly . Current chemical synthesis methods usually produce oligonucleotides that are prematurely terminated or comprise internal insertions or deletions . To determine how the oligonucleotide source modulates the production of error-free DNA fragments, gene C was synthesised using desalted, reverse-phase cartridge and reverse-phase HPLC purified primers obtained from three different suppliers. Gene C was assembled using PCR conditions defined above and five different oligonucleotide sources were analysed. The results, presented in Fig. 6, show that oligonucleotides from supplier B displayed the best performance. DNA plasmids of 16 recombinant clones for each condition were analysed by Sanger sequencing. The highest percentage of clones without errors was identified in genes synthesized with primers for supplier B which were not subjected to any purification (Table 3). PCR products assembled using reverse-phase cartridge and HPLC oligonucleotides have a lower percentage of clones without errors (B2 - 50% and B3 - 56%, respectively) when compared with exclusively desalted oligonucleotides. Thus, it is noteworthy observing that oligonucleotide purification does not solve the percentage of mutation observed in artificial genes. The most frequent mutation identified in the 80 recombinant clones was a single base deletion (44%, see Table 3 and Additional file 6: Table S6). These results suggest that truncated versions of the oligonucleotides (n-1) are difficult to remove by accessory purification methods as the desalted oligonucleotides from supplier B contain a lower frequency of deletions. In contrast, previous studies have shown that PAGE oligonucleotide purification is recommended for the successful production of synthetic genes by PCR assembly [8, 13, 16, 18]. However, the error rates reported in these studies is identical (~1 error per Kb of synthetic DNA) to our gene synthesis method revealing that oligonucleotide purification is not crucial for the accurate production of DNA fragments. In our high-throughput study, the purification of oligonucleotides can be a disadvantage since the production cost would significantly increase.
Large scale synthesis of genes encoding venom peptides using an automated platform
Previous optimized protocols were used to develop a platform for the simultaneous synthesis of small genes. Thus, the primary sequence of 96 venom peptides was used to design 96 genes that contained an average GC content of 49% and an average CAI of 0.86 (Table 4). To assemble the 96 genes, 576 (6 primers × 96 genes) oligonucleotides with a maximum of 60 nt were designed with an overlap region of 20 bp and a gap of 20 bp. In average, the genes had 240 bp in length and oligonucleotides were acquired without additional purification. Each gene was PCR assembled using the KOD Hot Start DNA polymerase. Outer primers were used at 800 nM (forward and reverse) while inner primers at a final concentration of 20 nM. PCR assembly was performed in 26 cycles of 95 °C for 20 s, 60 °C for 8 s and 3 s at 70 °C. The 96 genes were assembled simultaneously in a 96-well PCR plate and resulting nucleic acids analysed through agarose gel electrophoresis. The data, presented in Fig. 7, revealed that 94 out of the 96 genes were effectively assembled representing a 98% success rate of the gene synthesis protocol when applied to a large scale. After purification of the 96 generated PCR products, individual genes were sub-cloned into pHTP1 expression vector using a LIC method. The robustness and effectiveness of the pipeline was demonstrated when recombinant plasmids were sequenced to verify gene integrity. The initial screen of one clone per gene revealed that 77 genes (80.2%) were correct (Table 5). For 17 genes (17.7%) two clones were screened to identify an error-free DNA fragment. Finally, for 2 genes (2.1%) it was necessary to pick a third clone to obtain a correct DNA sequence. Thus, even for the two genes that apparently were amplified at a lower concentration it was possible to obtain a correct clone. In total, 26 mutations were identified in incorrect genes, leading to an overall error rate of 1 mutation per 0.9 kb. The majority of the identified mutations were deletions (77%), as it is expected from the incorporation of prematurely terminated oligonucleotide . The remaining mutations were single-base substitutions (19%) and insertions (4%).
There are some major differences between the approach and resulting efficacy of the method reported here when compared with previously described protocols (Table 6). Firstly, some of the reported gene synthesis methods involve usage of oligonucleotides subjected to subsequent downstream purifications (see Table 6). The protocol described here uses desalted oligonucleotides (no purification) to accurately produce different DNA fragments. Since price of non-purified oligonucleotides is reduced, the cost associated with the method described here is lower, which represents a strong advantage for both low and high-throughput protocols. Secondly, this method uses 60-bp oligonucleotides while some reported methods use oligonucleotides with a length below 60 bases. Thus, the number of oligonucleotides required for each assembly reaction using the protocol described here is significantly reduced while maintaining the success rate of gene synthesis. Although large primers can incorporate more errors, data reported here show that the increase of oligonucleotide size from 40 to 60 bp had no impact in the percentage of correct synthesised DNA sequences. Thus, for the protocol reported here, the use of 60-bp oligonucleotides is believed to provide the best balance between error rate and production cost. Finally, the error rate observed in the gene synthesis method reported here is lower or identical to previously reported methods (Table 6), revealing that this platform is efficient and robust to synthesise multiple genes in simultaneous.
The ability to de novo synthesize DNA sequences is rapidly emerging to improve the speed, accuracy and simplicity of recombinant DNA technology. Here, we have optimized a novel gene synthesis large scale platform for the efficient production of small genes (<0,5 kb). The genes were directly cloned into an E. coli expression vector using a completely automated protocol. This gene synthesis approach presents high efficiencies of PCR assembly and cloning while revealing low error rates. The error rate of the large scale method described here is of 1.1 mutations per kb. Low error rates avoid additional steps for the removal of errors from synthesized genes, such as those involving the use of proteins that recognize mismatches within DNA sequences to remove DNA mutations. The identification of 100% correct genes was performed by screening a maximum of 3 colonies. Thus, the labour required for the selection and validation of recombinant clones is reduced. The use of overlapping oligonucleotides combined with Kod DNA polymerase provides a powerful alternative to conventional synthesis protocols. The length of all oligonucleotides is below 60 nt, with 20-bp overlap regions and gaps of 20 nt. This represents a decrease in the number of oligonucleotides for a given gene saving costs. The PCR-based gene synthesis method described here is an optimization of the simplified gene synthesis method (SGS) . However, among other details, average primer lengths in the protocol described in this study is larger than used in SGS methods. In conclusion, the gene synthesis approach described here is a simple, accurate and robust system that can be used to construct at low cost and in short periods of time large numbers of de novo DNA molecules for a variety of applications.
Codon Adaptation Index
Dual asymmetrical PCR
Improved PCR synthesis
Ligation-Independent Cloning protocol
PCR-based accurate synthesis
Polymerase chain assembly
Polymerase chain assembly using DNA template
Polymerase chain assembly DNA template-free
PCR-based two-step DNA synthesis
Simplified gene synthesis
Thermodynamically balanced inside-out technology
Tobacco Etch Virus protease.
Ashman K, Matthews N, Frank RW. Chemical synthesis, expression and product assessment of a gene coding for biologically active human tumour necrosis factor alpha. Protein Eng. 1989;2:387–91.
Hayashi N, Welschof M, Zewe M, Braunagel M, Dübel S, Breitling F, et al. Simultaneous mutagenesis of antibody CDR regions by overlap extension and PCR. Biotechniques. 1994;17:310–2. 314–5.
Hoover DM, Lubkowski J. DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res. 2002;30:e43.
Stemmer WP, Crameri A, Ha KD, Brennan TM, Heyneker HL. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene. 1995;164:49–53.
Strizhov N, Keller M, Mathur J, Koncz-Kálmán Z, Bosch D, Prudovsky E, et al. A synthetic cryIC gene, encoding a Bacillus thuringiensis delta-endotoxin, confers Spodoptera resistance in alfalfa and tobacco. Proc Natl Acad Sci U S A. 1996;93:15012–7.
Gao X, Yo P, Keith A, Ragan TJ, Harris TK. Thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis: a novel method of primer design for high-fidelity assembly of longer gene sequences. Nucleic Acids Res. 2003;31:e143.
Young L, Dong Q. Two-step total gene synthesis method. Nucleic Acids Res. 2004;32:e59.
Gordeeva TL, Borschevskaya LN, Sineoky SP. Improved PCR-based gene synthesis method and its application to the Citrobacter freundii phytase gene codon modification. J Microbiol Methods. 2010;81:147–52.
Wu G, Wolf JB, Ibrahim AF, Vadasz S, Gunasinghe M, Freeland SJ. Simplified gene synthesis: a one-step approach to PCR-based gene construction. J Biotechnol. 2006;124:496–503.
Binkowski BF, Richmond KE, Kaysen J, Sussman MR, Belshaw PJ. Correcting errors in synthetic DNA through consensus shuffling. Nucleic Acids Res. 2005;33:e55.
Carr PA. Protein-mediated error correction for de novo DNA synthesis. Nucleic Acids Res. 2004;32:e162.
Wan W, Li L, Xu Q, Wang Z, Yao Y, Wang R, Zhang J, Liu H, Gao X, Hong J. Error removal in microchip-synthesized DNA using immobilized MutS. Nucleic Acids Res. 2014;42:e102.
Xiong A-S, Yao Q-H, Peng R-H, Li X, Fan H-Q, Cheng Z-M, et al. A simple, rapid, high-fidelity and cost-effective PCR-based two-step DNA synthesis method for long gene sequences. Nucleic Acids Res. 2004;32:e98.
Tindall KR, Kunkel TA. Fidelity of DNA synthesis by the Thermus aquaticus DNA polymerase. Biochemistry. 1988;27:6008–13.
Takagi M, Nishioka M, Kakihara H, Kitabayashi M, Inoue H, Kawakami B, et al. Characterization of DNA polymerase from Pyrococcus sp. strain KOD1 and its application to PCR. Appl Environ Microbiol. 1997;63:4504–10.
Xiong A-S, Yao Q-H, Peng R-H, Duan H, Li X, Fan H-Q, et al. PCR-based accurate synthesis of long DNA sequences. Nat Protoc. 2006;1:791–7.
Tian J, Ma K, Saaem I. Advancing high-throughput gene synthesis technology. Mol Biosyst. 2009;5:714.
Yang JK, Chen FY, Yan XX, Miao LH, Dai JH. A simple and accurate two-step long DNA sequences synthesis strategy to improve heterologous gene expression in Pichia. PLoS One. 2012;7:2–8.
LeProust EM, Peck BJ, Spirin K, McCuen HB, Moore B, Namsaraev E, et al. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process. Nucleic Acids Res. 2010;38:2522–40.
Zampini M, Stevens PR, Pachebat JA, Kingston-Smith A, Mur LAJ, Hayes F. RapGene: a fast and accurate strategy for synthetic gene assembly in Escherichia coli. Sci Rep. 2015;5:11302.
Ana Filipa Sequeira was supported by Fundação para a Ciência e a Tecnologia (Lisbon, Portugal) and NZYTech through the individual fellowship SFRH/BD/51602/2011. This work was supported by The VENOMICS project, European project grant N° 278346 through the Seventh Framework Program (FP7 HEALTH 2011–2015). The VENOMICS project involves the collaboration between several research institutions and companies in Europe: AFMB, Aix-Marseille Université (France), CEA Saclay (France), NZYTech (Portugal), Sistemas Genomicos (Spain), University de Liege (Belgium) and Zealand Pharma (Denmark).
Availability of supporting data
All data generated or analysed during this study are included in this published article and its supplementary information files.
AFS and CMGAF designed the study and composed the manuscript. AFS made substantial contributions to the acquisition of data. JLAB provided support to data acquisition. AFS and CMGAF analysed and interpreted the data. All authors read and approved the final version of the manuscript.
The authors declare financial competing interests since NZYTech provides gene synthesis services. Renaud Vincentelli declares no competing interests.
Consent for publication
Ethics approval and consent to participate
Sequences properties of genes A, B and C used to optimize the PCR conditions. (XLSX 10 kb)
Primer sequences used to assembly of gene A using different approaches for gene synthesis. (XLSX 12 kb)
Primer sequences used to assembly of genes B and C. (XLSX 9 kb)
PCR programs used to optimize the PCR profile of a PCR-assembly reaction. (XLSX 9 kb)
Sequence properties of 96 genes synthesised using the HTP gene synthesis platform. (XLSX 23 kb)
Mutation types identified in the genes produced. (XLSX 15 kb)