Development of a gene synthesis platform for the efficient large scale production of small genes encoding animal toxins

Background Gene synthesis is becoming an important tool in many fields of recombinant DNA technology, including recombinant protein production. De novo gene synthesis is quickly replacing the classical cloning and mutagenesis procedures and allows generating nucleic acids for which no template is available. In addition, when coupled with efficient gene design algorithms that optimize codon usage, it leads to high levels of recombinant protein expression. Results Here, we describe the development of an optimized gene synthesis platform that was applied to the large scale production of small genes encoding venom peptides. This improved gene synthesis method uses a PCR-based protocol to assemble synthetic DNA from pools of overlapping oligonucleotides and was developed to synthesise multiples genes simultaneously. This technology incorporates an accurate, automated and cost effective ligation independent cloning step to directly integrate the synthetic genes into an effective Escherichia coli expression vector. The robustness of this technology to generate large libraries of dozens to thousands of synthetic nucleic acids was demonstrated through the parallel and simultaneous synthesis of 96 genes encoding animal toxins. Conclusions An automated platform was developed for the large-scale synthesis of small genes encoding eukaryotic toxins. Large scale recombinant expression of synthetic genes encoding eukaryotic toxins will allow exploring the extraordinary potency and pharmacological diversity of animal venoms, an increasingly valuable but unexplored source of lead molecules for drug discovery. Electronic supplementary material The online version of this article (doi:10.1186/s12896-016-0316-3) contains supplementary material, which is available to authorized users.


Background
Synthetic biology, an interdisciplinary branch of biology, is quickly becoming one of the most attractive areas of research thanks to the recent developments in gene synthesis technology. In combination with intelligent gene design, gene synthesis is emerging as a valuable tool to support recombinant protein expression. De novo gene design allows optimizing codon usage to the recombinant host system thus promoting the effective operation of the cellular translational machinery. In addition, in cases where the nucleic acid template is not available, gene synthesis allows creating DNA molecules de novo. The exponential growth of genomic and metagenomic databases and the current limitations in using this highly useful sequence information due to the lack of tangible DNA are promoting the rapid development of novel gene synthesis technologies.
In recent years, a variety of gene synthesis methodologies have been developed based on the assembling of oligonucleotides into complete genes. Early approaches advanced to synthesize nucleic acids used the enzymatic ligation of pre-formed duplexes of phosphorylated overlapping oligonucleotides [1]. Subsequently, selfpriming PCR [2], PCR assembly [3], Polymerase chain assembly (PCA) [4] and template-directed ligation [5] were described as efficient concepts for de novo gene synthesis of nucleic acids. Recently, methods based on a two-step approach were reported for the production of long DNA sequences. Examples of these technologies are the PCR-based thermodynamically balanced insideout technology (TBIO) [6], the two-step total gene synthesis method [7] that combines both dual asymmetrical PCR (DA-PCR) and overlap-extension (OE-PCR), the PCR-based two-step DNA synthesis (PTDS) [8] and PCR-based accurate synthesis (PAS) [9].
Lately, improvements in PCR-based gene synthesis methods, as exemplified by the development of the improved PCR synthesis (IPS) and the simplified gene synthesis (SGS) protocols [8,9], have been described and incorporate significant simplifications over earlier strategies. SGS uses oligonucleotides of 40 nucleotides (nt) in length and 18-20 nt of overlap region, which are assembled in a unique PCR-assembly reaction leading to the direct construction of the full-length DNA molecule. The simplicity of this protocol combined with its relative low cost, since there no requirement for phosphorylation or purification of the oligonucleotides exists, are a solid base for the development of even more effective PCRbased methods. However, major drawbacks persist and effective improvements need to be implemented in current synthetic protocols to allow their translation to a large scale. One of the major bottlenecks of current gene synthesis protocols consists on the quality of the oligonucleotides used for nucleic acid assembly. It is known that all current gene synthesis methods accumulate errors in the final synthetic molecules. Sequence errors usually derive from the incorporation of imperfect synthetic oligonucleotides or result from low fidelity rates associated with the enzymatic assembling step. Current oligonucleotide synthesis methods produce sequences that are often prematurely terminated, or comprise internal mutations (error rates range from 1 to 10 mutation per kilobase (kb)) [10]. In addition, chemical synthesis of DNA molecules usually not only involve moderate to high error rates but also high costs. Moreover, the chemical synthesis of a desired gene also depends on the accuracy of the DNA polymerase used to assemble the oligonucleotides in a final DNA sequence. Therefore, DNA errors are inevitable and it is often necessary to remove the incorrect synthetic DNA molecules using enzymatic methods [11,12]. Improvements in oligonucleotide quality, error correction and DNA polymerase efficacy are thus urgently required.
Conventionally, PCR-based gene synthesis is employed to produce a single gene at a time. Thus, development of automated platforms that effectively generate large libraries of nucleic acids is urgently needed. The different steps leading to a single PCR-assembly strategy need to remain simple, accurate and robust when extended to the assembly of multiple genes simultaneously. To develop large scale methods, many factors that affect the efficiency of gene assembly, such as DNA polymerases performance or oligonucleotide concentration and quality require optimization. This work describes different approaches carried out to optimize current gene synthesis protocols. The data was integrated to develop a novel platform which was applied to efficiently synthesize and clone a large number of nucleic acids encoding venom peptides. This automated platform can be translated to the rapid generation of complex gene libraries encoding different families of biotechnologically relevant and valuable proteins and peptides.

Gene design
The original purpose of this research was to optimize protocols for the synthesis of small genes encoding eukaryotic peptides for expression in Escherichia coli. Three genes of different lengths (A: 290 bp, B: 260 bp, and C: 329 bp) were selected to develop these studies. Gene A encodes the alpha-elapitoxin-Nk2a toxic protein isolated from the Naja kaouthia venom, and genes B and C encode two different venom peptides of unknown function. Venom genes were designed by backtranslating the peptide sequence and optimizing codon usage for high levels of expression in E. coli. Guaninecytosine (GC) content was set to vary between 40 and 60%. Gene design maximized stable mRNA molecules, minimized the presence of repeated sequences and avoided the appearance of E. coli regulatory sequences such as promoters, activators or operators. In addition, Codon Adaptation Index (CAI) value was set to be higher than 0.8. DNA sequences of the three genes are presented in Additional file 1: Table S1.

Oligonucleotides and purification
Oligonucleotides were synthesized by three different suppliers (A, B and C) using the smaller scale available with standard desalting. Reverse-phase cartridge and reverse-phase HPLC purifications were also tested for oligonucleotides obtained from supplier C. Each oligonucleotide was reconstituted in 10 mM Tris-HCl (pH 8.5) to a 100 μM final concentration and kept at −20°C. Oligonucleotides were used individually or in mixtures at concentrations described below.

Primer design
The DNA sequence of each gene was used as a template to design the assembly oligonucleotides by dividing the entire sequence into overlapping primers with defined lengths. The external oligonucleotides, termed outer primers, correspond to the external forward and reverse primers. For strategy A (see below), the outer primers include a complementary sequence of 15 bp to promote plasmid re-circulation through homologous recombination (Fig. 1a). In strategy B (see below) the outer primers contained an additional sequence at the 5'terminus of both forward and reverse primers for Fig. 1 Schematic diagram representing the two different approaches used for gene synthesis: a. Polymerase chain assembly using DNA template (PCA-DT), and b. Polymerase chain assembly DNA template-free (PCA-DTF). Each oligonucleotide is represented as an arrow; black arrows correspond to internal (inner) oligonucleotides while external (outer) oligonucleotides are denoted as orange arrows. Each strand of the desired gene is dissected into two or more oligonucleotides that are amplified (A) or overlap extended (b) by a DNA polymerase. A1-Two long oligonucleotides were designed with a 20 bp overlap region with the cloning vector; A2-Two sets of 40-mer oligonucleotides containing the 5'-end and 3'-end sequences of the desired gene were assembled by PCR; A3 -Successive oligonucleotides with 40 nt in length were designed with a 20 nt overlaps regions between adjacent oligonucleotides to construct the full-length of cloning vector containing the desired gene. The PCR products from A1, A2 and A3 strategies were inserted in E. coli cells through homologous recombination. Strategy B corresponds to a single-step PCR assembly step that does not require template DNA. B1 -Gapless 40-mer oligonucleotides containing 20 nt overlap regions between primers were used to assemble the synthetic gene; B2 -40-mer oligonucleotides with 15 bp overlaps between forward and reverse primers, with gaps of 10 nt were mixed to produce the gene of interest; B3 -The synthetic gene was synthesized by mixing a 60-mer overlapping oligonucleotides containing gaps of 20 nt. After overlap extension by DNA polymerase, gene fragments from B1, B2 and B3 were cloned into a vector using a ligation-independent cloning method. All outer primers contain a vector complementary region at the 5'-end (highlighted in orange rectangles) or a 16 bp complementary sequence (highlighted in blue rectangles) to facilitate the cloning reaction cloning into the cloning vector through a Ligation-Independent Cloning protocol (LIC) (Fig. 1b). Internal oligonucleotides (termed inner primers) are used in higher numbers than outer primers. Depending on the gene assembly strategy used, the oligonucleotides were designed with gaps between adjacent primers and comprising 15-20 bp overlap regions. The sequence of all oligonucleotides used in these studies are displayed in Additional file 2: Table S2.

Novel strategies to synthesize small genes
In order to develop an efficient, cost-effective and low error rate strategy to produce synthetic genes with reduced size, two different strategies to construct optimized DNA sequences were initially explored: (A) Polymerase chain assembly using DNA template (PCA-DT) and (B) Polymerase chain assembly DNA template-free (PCA-DTF) ( Fig. 1a and b, respectively). PCA-DT was developed to decrease the time involved in traditional gene synthesis methods since this method combines both synthesis and cloning in a single reaction. In step 1, two long (A1) or a pool of small oligonucleotides containing the gene sequence (A2 and A3) were mixed with the cloning vector and then assembly proceeds using a DNA polymerase in a typical cyclic temperature reaction ( Fig. 1a) (Table 1). In step 2, the product of the PCR amplification combining both the newly synthesized gene and the vector were used to transform E. coli cells. PCA-DTF strategy is based on previously reported methods used to produce synthetic genes in a single PCR reaction [4,9,11]. All oligonucleotides (inner and outer) were pooled together and assembled in a single polymerase chain reaction. The outer primers were used in a higher concentration than inner primers to ensure the construction of full-length sequence of synthetic gene. Different approaches for oligonucleotide design were tested ( Fig. 1, B1, B2 and B3). The PCA-DTF method requires a subsequent ligation-free cloning step to insert the synthetic gene into the cloning vector.
Three different strategies based on PCA-DT method were used to synthesize gene A (290 nt). Two, fourteen and twelve oligonucleotides were designed to synthesize full-length gene A following strategies A1, A2 and A3, respectively. For A1 strategy, the gene sequence was dissected in two long oligonucleotides of 135 nt, including a 20 nt overlap with the cloning vector. To produce the synthetic gene using A2 strategy, fourteen oligonucleotides of 20-40 nt with a 20 bp overlap region between forward and reverse primers and no gaps between adjacent oligonucleotides were designed. The length of twelve oligonucleotides used in A3 was 35-40 nt and the overlapping region between successive oligonucleotides was 20 bases. were designed (Fig. 1, B1 and B2). Strategy B3 used larger oligonucleotides with 60 nt including 20 nt overlaps ( Fig. 1, B3). Outer primers include an additional ligation independent sequence (27 nt on the forward primer and 32 nt on the reverse primer), in order to allow ligation-independent cloning and a 18 bp encoding the Tobacco Etch Virus (TEV) protease. Oligonucleotide assembly was performed as described above using Pfu Turbo DNA polymerase. Assembly reaction was subjected to one cycle of initial denaturation at 95°C for 5 min, followed by 26 cycles of denaturation at 95°C for 30 s; annealing at 55°C for 30 s; and extension at 72°C for 30 s. PCR amplification products were column purified, as described above, and cloned into pDONR201 vector using Gateway technology (see below).

Optimization of PCR conditions for successful gene synthesis protocol
Efficacy and accuracy of DNA polymerases and the quality and concentration of primers are two critical parameters known to influence gene synthesis. In addition, annealing temperatures and times used for denaturation, annealing and extension during PCR may affect nucleic acid yields. To optimize these parameters, two genes (B and C) were synthesised using gene synthesis strategy B3. Six and eight oligonucleotides with 58-60 nt and a 20 bp gap between primers were used to synthesise genes B and C, respectively. The sequences of overlapping oligonucleotides used to produce genes B and C are presented in Additional file 3:  Table S4.

Cloning and sequencing
After PCR assembly, resulting nucleic acids were purified, inserted into a suitable vector and the integrity of each gene was confirmed by DNA sanger sequencing. Synthesized genes produced by strategy A were ligated into pNZY28 cloning plasmid using the homologous recombination machinery present in E. coli cells. Gene products from strategies B1, B2 and B3 which contained attb1 and attb2 sequences at its 5' and 3'-ends, respectively, were cloned into pDONR201 vector (ThermoFisher Scientific) using the Gateway cloning system (ThermoFisher Scientific). Cloning reaction mixtures were used to transform E. coli DH5α competent cells. For each transformation, one bacterial colony was inoculated and grown in liquid LB medium supplemented with 100 μg/mL of ampicillin. Plasmids were purified and recombinant integrity of inserted nucleic acids confirmed by sequencing.
Construction of a novel gene synthesis platform for the large scale production of small synthetic genes An integrated gene synthesis platform was developed for the efficient production of small synthetic genes. This platform combines automation, simplicity and robustness, while decreasing the error rate associated with conventional gene synthesis methods. Initial experiments described above defined the most appropriate PCR assembly protocol. Subsequent experiments evaluated the efficacy of the protocol when applied for the simultaneously synthesis of 96 genes encoding venom peptides. Ninety six genes were designed by back-translating corresponding peptide sequences and by optimizing codon usage for high levels of expression in E. coli. Codons were selected randomly using a Monte Carlo approach according to E. coli codon usage of highly expressed genes. Genes were designed to have a GC content between 40 and 60% and a codon adaptation index (CAI) value higher than 0.8. The sequences of the 96 optimized genes are presented in Additional file 5: Table S5. The pool of primers required to synthesize 96 synthetic genes encoding venom peptides were designed to have 50-60 nt in length, an overlap region of 20 nt between forward and reverse primers with a gap sequence of 20 nt, and included an additional 16-bp conserved sequences at 5'-terminus of both forward and reverse outer primers to allow ligation-independent cloning. The 96 genes were produced from 96 mixes of six oligonucleotides with 50-60 nt in length and 20 nt overlaps. Oligonucleotides were synthesised by Integrated DNA Technologies at the smallest scale (primer solutions at 5 μM) with desalting purification. The two outer primers were used at a final concentration of 800 nM while the inner primers were pooled together in an equimolar mixture to achieve the final concentration of 20 nM. Primer dilutions and PCR assembly was carried out in a 96-well plate format using a Tecan workstation (Switzerland). KOD Hot Start DNA polymerase (EMD Millipore) was used for PCR assembly using optimized conditions to minimize primer-dimer formation and nonspecific amplifications. PCR reactions were performed in a 50 μL total volume and consisted of 0.2 mM dNTPs, 1.5 mM MgCl 2 , 1× reaction buffer and 1 unit of KOD Hot Start DNA polymerase. PCR assembly reactions were carried out in a 96-well PCR plate format. The cycling parameters were as follows: 1 cycle of 95°C for 2 min; 26 cycles of 95°C for 20 s, 60°C for 8 s, and 70°C for 3 s. After PCR assembly, assembled PCR products were visualized by agarose gel electrophoresis and purified through silica-base chromatography in a Tecan liquid handler (Switzerland). Purified PCR products were cloned into pHTP1 expression vector (NZYTech, Ltd) using the NZYEasy cloning kit (NZYTech, Ltd) that follows a LIC technology. Gene assembly products were mixed with 120 ng of linearized vector, using a molar ratio of 1:5 (vector:insert). Cloning reactions were performed in 10 μL volume in a 96-well PCR plate and preceded for 1 h at 37°C on a heating block. The mixtures were then incubated at 80°C for 10 min followed by 10 min at 30°C. Recombinant plasmids were transformed using a highthroughput method into E. coli DH5α competent cells and spread on LB agar plates supplemented with 50 μg/mL of kanamycin. After overnight incubation at 37°C, only one colony per transformation was picked and grown in 5 mL of LB kanamycin medium in 24-deep-well plates sealed with gas-permeable adhesive seals. Cultures were incubated at 37°C for~16 h and cells were then harvested at 1,500 × g for 15 min. Plasmids were purified from bacterial pellets in a Tecan workstation (Switzerland), and subsequently the DNA sequence of each gene was verified by Sanger sequencing. In case the DNA sequence did not correspond with the designed gene, a second and eventually third colony was picked for sequencing analysis.

Results and discussion
Synthesis and assembly of the 213 nt gene encoding alphaelapitoxin-Nk2a toxin using PCA-DT and PCA-DTF methods The gene encoding the toxic peptide alpha-elapitoxin-Nk2a was designed to maximize expression in E. coli. Initial experiments aimed at identifying the most appropriate strategy for the synthesis of small genes and attempted to reduce the number of steps involved in traditional gene synthesis approaches. Thus, the efficiency of PCA-DT and PCA-DTF PCR-based methods was tested for the synthesis of gene A, which has a size of 290 nt (Fig. 1). To ensure simplicity and speed in the synthetic gene process, PCA-DT does not involve an additional cloning step. Using method A1 (Fig. 1), gene A was synthesised using pNZY28 vector as DNA template and employing two long oligonucleotides (135 nt) to amplify the full-length plasmid sequence containing half of the gene sequence in each 5' and 3' ends. Long oligonucleotides are more prone to incorporate errors. Thus, in alternative, gene A was also amplified using a set of 14 and 12 overlapping oligonucleotides (Fig. 1, strategies A2 and A3, respectively), which contain a 20 nt sequence that hybridises with the cloning plasmid. In contrast, PCA-DTF methods use a template-less approach to assemble the artificial gene. The six methods described in Fig. 1 were employed to synthesize gene A. The data, a b c Fig. 2 The efficacy of PCA-DT and PCA-DTF methodologies to generate small nucleic acids. Panel a: synthesis of gene A was performed using a vector DNA template (A1, A2 and A3) and the PCR products correspond to a 3,099 bp DNA fragment, which combines gene A sequence plus pNZY28 vector sequence. b: gene A was also assembled using different pools of overlapping oligonucleotides (B1, B2 and B3); lane C − corresponds to a negative control reaction performed at the same conditions, without addition of primers. c: Effect of primer design on percentage of clones without errors. M1: NZYLadder III, M2: NZYLadder I presented in Fig. 2, revealed that all strategies effectively generated the toxic gene. However, yield of the assembled target nucleic acid was higher using strategies B when compared with the quantity of PCR product obtained using the PCA-DT. Although strategy A does not require an additional step to insert the synthetic gene into the cloning plasmid, globally the process is more tedious due to the long periods required for the amplification of large nucleic acids, as PCR involves both the toxic gene and the vector (~3 Kb, pNZY28 vector plus target gene). In addition, cloning efficiency, evaluated through colony-PCR, revealed an approximately 95% cloning efficacy of the Gateway system versus 80% when using the self-ligation process of strategy A. These results suggest that strategies including a DNA template and involving plasmid re-circularization are probably not the best option for the synthesis of small genes as they are more tedious while leading to lower cloning efficiencies. These two issues are particularly important when protocols require automation. In order to verify if the length of the oligonucleotides influences the appearance of errors in synthetic genes, we selected 24 clones synthesized by strategies B1, B2 and B3 for DNA sequence verification using Sanger sequencing. Analysis of the 72 sequences revealed that approximately 80% of the clones for each one of the three strategies presented the correct DNA sequence (Fig. 2c). Thus, increasing primer size from 40 to 60 nt has no impact in the number of errors in the resulting synthetic DNA. This primer size was also used by Stemmer and colleagues to successfully synthesise two long DNA fragments [4], although the error rate associated with the method was not reported. Therefore, taken together, the data suggest that the PCA-DTF gene assembly method that uses a set of 60 nt overlapping oligonucleotides with 20 nt gaps, is the most convenient strategy for the synthesis of small genes as it provides high gene synthesis yields and efficacy with increased cloning efficiencies.

Performance of various thermostable DNA polymerases for gene synthesis
Taq, Kod and Pfu polymerases have been commonly used for the production of synthetic genes following PCR-based methods [4,7,13]. However, Taq polymerase is known to be error-prone [14]. The use of Kod and Pfu polymerases allow higher accuracy in gene synthesis although the elongation rate of Pfu polymerases is lower [15]. Here, the efficacy of four different DNA polymerases (KOD Hot Start DNA polymerase, Q5® Hot Start HF, Pfu Turbo DNA polymerase, and Taq DNA polymerase) for the production of gene B (260 bp), using B3 strategy PCA-DTF (described above) was analysed. The data, presented in Fig. 3, revealed that the four polymerases effectively assembled the 260 bp gene. However, KOD Hot Star DNA polymerase seems to express a higher performance when compared with the other enzymes. These data suggest that efficacy of Kod polymerases is higher than Taq and Pfu enzymes for assembling small genes. After cleaning the PCR products, genes were cloned into pDONR201 vector and 24 clones assembled by each one of the four DNA polymerase were sequenced. The data, presented in Table 2,  revealed that appearance of mutations is more frequent for Taq, followed by Q5® Hot Start HF and Pfu Turbo DNA polymerase. In contrast, only five out of the 22 recombinant plasmids containing the synthetic gene B assembled by KOD Hot Star DNA polymerase presented errors, which reflects one mutation per 1.15 kb. Deletions and substitutions were the most frequent errors identified in the 96 variants of gene B sequenced. As expected, the data suggested that KOD Hot Start DNA polymerase is more accurate than the other three DNA polymerases due to a higher fidelity. These results are in line with previous studies [8,9,16], which have revealed that usage of high fidelity DNA polymerases decreases the number of errors introduced in synthetic genes during PCR amplification. Moreover, the PCA-DTF procedure appears to be very efficient with KOD Hot Start DNA polymerase due to the rapid elongation rates presented by this DNA polymerase; completion of the gene synthesis protocol is achieved in less than 40 min.

Oligonucleotide concentration influences the efficacy of gene synthesis
Since PCA-DTF is suggested to be the best method to produce small synthetic genes and KOD Hot Star DNA polymerase is the most effective enzyme to apply in these protocols, we analysed the influence of  Fig. 4a, suggest that gene synthesis is most effective with 30 nM of inner primers. In addition, the best concentrations of outer primers were of 600 and 1000 nM. In order to define more precisely the best concentrations of primers, outer primer concentration was fixed at 800 nM and concentrations of inner primers varied from 10 to 30 nM. Data suggest that at 800 nM of outer primers, the optimal concentration of inner oligonucleotides is 20 nM (Fig. 4b). The assembly reaction was also performed under different concentrations of dNTPs. Interestingly, the results suggest that concentrations of dNTPs below 0.2 mM are not appropriate for gene synthesis. Thus, for a robust and successful PCA-DTF procedure a concentration of 0.3-0.4 mM of dNTPs seems to be the most effective (Fig. 4).

Effect of cycling temperatures on the efficiency of gene synthesis
In the previous assembly reactions following PCA-DTF, annealing temperature was set to 60°C. Several studies [3] have suggested that factors such as melting temperature (T m ) and GC content affect optimal assembly. Thus, the efficiency of synthesis of gene B was tested using a gradient of annealing temperatures (50°C, 52.3°C, 54.6°C, 59.6°C and 62°C) applying the optimal PCR conditions described in the previous section. The data, presented in Fig. 5, suggest that oligonucleotide assembly occurs at temperatures ranging from 50 to 62°C, although yields of nucleic acid seems to increase at higher annealing temperatures. In addition, the effect of the number of PCR cycles in the efficiency of gene synthesis was tested by synthesizing gene C using 22, 24 and 26 thermal cycles. The data revealed that, as expected, the quantity of the amplified product increases with the number of thermal cycles employed (Fig. 5). Likewise, when denaturation last 20 s, an extension of 3 s produces higher DNA yields than a 1 s extension. In addition, annealing of overlapping oligonucleotides during 8 s is more favourable than for 10 s. Therefore, the data revealed that 26 thermal cycles of denaturation (20 s), annealing (8 s) and extension (3 s) step are optimal for PCR assembly following the PCA-DTF method.

Effect of oligonucleotide source in the efficacy of gene synthesis
It is well known that efficacy of gene synthesis directly depends on the quality of the synthetic oligonucleotides used for DNA assembly [17]. Current chemical synthesis methods usually produce oligonucleotides that are prematurely terminated or comprise internal insertions  or deletions [17]. To determine how the oligonucleotide source modulates the production of error-free DNA fragments, gene C was synthesised using desalted, reverse-phase cartridge and reverse-phase HPLC purified primers obtained from three different suppliers. Gene C was assembled using PCR conditions defined above and five different oligonucleotide sources were analysed. The results, presented in Fig. 6, show that oligonucleotides from supplier B displayed the best performance. DNA plasmids of 16 recombinant clones for each condition were analysed by Sanger sequencing. The highest percentage of clones without errors was identified in genes synthesized with primers for supplier B which were not subjected to any purification (Table 3). PCR products assembled using reverse-phase cartridge and HPLC oligonucleotides have a lower percentage of clones without errors (B2 -50% and B3 -56%, respectively) when compared with exclusively desalted oligonucleotides. Thus, it is noteworthy observing that oligonucleotide purification does not solve the percentage of mutation observed in artificial genes. The most frequent mutation identified in the 80 recombinant clones was a single base deletion (44%, see Table 3 and Additional file 6: Table S6). These results suggest that truncated versions of the oligonucleotides (n-1) are difficult to remove by accessory purification methods as the desalted oligonucleotides from supplier B contain a lower frequency of deletions. In contrast, previous studies have shown that PAGE oligonucleotide purification is recommended for the successful production of synthetic genes by PCR assembly [8,13,16,18]. However, the error rates reported in these studies is identical (~1 error per Kb of synthetic DNA) to our gene synthesis method revealing that oligonucleotide purification is not crucial for the accurate production of DNA fragments. In our high-throughput study, the purification of oligonucleotides can be a disadvantage since the production cost would significantly increase.
Large scale synthesis of genes encoding venom peptides using an automated platform Previous optimized protocols were used to develop a platform for the simultaneous synthesis of small genes. Thus, the primary sequence of 96 venom peptides was used to design 96 genes that contained an average GC content of 49% and an average CAI of 0.86 (Table 4). To assemble the 96 genes, 576 (6 primers × 96 genes) oligonucleotides with a maximum of 60 nt were designed with an overlap region of 20 bp and a gap of 20 bp. In average, the genes had 240 bp in length and oligonucleotides were acquired without additional purification. Each gene was PCR assembled using the KOD Hot Start DNA polymerase. Outer primers were used at 800 nM (forward and reverse) while inner primers at a final concentration of 20 nM. PCR assembly was performed in 26 cycles of 95°C for 20 s, 60°C for 8 s and 3 s at 70°C. The 96 genes were assembled simultaneously in a 96well PCR plate and resulting nucleic acids analysed through agarose gel electrophoresis. The data, presented in Fig. 7, revealed that 94 out of the 96 genes were effectively assembled representing a 98% success rate of the gene synthesis protocol when applied to a large scale. After purification of the 96 generated PCR products, individual genes were sub-cloned into pHTP1 expression vector using a LIC method. The robustness and effectiveness of the pipeline was demonstrated when recombinant plasmids were sequenced to verify gene integrity. The initial screen of one clone per gene revealed that 77 genes (80.2%) were correct (Table 5). For 17 genes (17.7%) two clones were screened to identify an error-free DNA fragment. Finally, for 2 genes (2.1%) it was necessary to pick a third clone to obtain a correct DNA sequence. Thus, even for the two genes that apparently were amplified at a lower concentration it was possible to obtain a correct clone. In total, 26 mutations were identified in incorrect genes, leading to an overall error rate of 1 mutation per 0.9 kb. The majority of the identified mutations were deletions (77%), as it is expected from the incorporation of prematurely terminated oligonucleotide [19]. The remaining mutations were single-base substitutions (19%) and insertions (4%). There are some major differences between the approach and resulting efficacy of the method reported here when compared with previously described protocols (Table 6). Firstly, some of the reported gene synthesis methods involve usage of oligonucleotides subjected to subsequent downstream purifications (see Table 6). The protocol described here uses desalted oligonucleotides (no purification) to accurately produce different DNA fragments. Since price of non-purified oligonucleotides is reduced, the cost associated with the method described here is lower, which represents a strong advantage for both low and high-throughput protocols. Secondly, this method uses 60-bp oligonucleotides while some reported methods use oligonucleotides with a length below 60 bases. Thus, the number of oligonucleotides required for each assembly reaction using the protocol described here is significantly reduced while maintaining the success rate of gene synthesis. Although large primers can incorporate more errors, data reported here show that the increase of oligonucleotide size from 40 to 60 bp had no impact in the percentage of correct synthesised DNA sequences. Thus, for the protocol reported here, the use of 60-bp oligonucleotides is believed to provide the best balance between error rate and production cost. Finally, the error rate observed in the gene synthesis method reported here is lower or identical to previously reported methods ( Table 6), revealing that this platform is efficient and robust to synthesise multiple genes in simultaneous.  NM-not mentioned