PCR-based gene synthesis to produce recombinant proteins for crystallization
© Marsic et al. 2008
Received: 29 January 2008
Accepted: 29 April 2008
Published: 29 April 2008
Skip to main content
© Marsic et al. 2008
Received: 29 January 2008
Accepted: 29 April 2008
Published: 29 April 2008
Gene synthesis technologies are an important tool for structural biology projects, allowing increased protein expression through codon optimization and facilitating sequence alterations. Existing methods, however, can be complex and not always reproducible, prompting researchers to use commercial suppliers rather than synthesize genes themselves.
A PCR-based gene synthesis method, referred to as SeqTBIO, is described to efficiently assemble the coding regions of two novel hyperthermophilic proteins, PAZ (Piwi/Argonaute/Zwille) domain, a siRNA-binding domain of an Argonaute protein homologue and a deletion mutant of a family A DNA polymerase (PolA). The gene synthesis procedure is based on sequential assembly such that homogeneous DNA products can be obtained after each synthesis step without extensive manipulation or purification requirements. Coupling the gene synthesis procedure to in vivo homologous recombination techniques allows efficient subcloning and site-directed mutagenesis for error correction. The recombinant proteins of PAZ and PolA were subsequently overexpressed in E. coli and used for protein crystallization. Crystals of both proteins were obtained and they were suitable for X-ray analysis.
We demonstrate, by using PAZ and PolA as examples, the feasibility of integrating the gene synthesis, error correction and subcloning techniques into a non-automated gene to crystal pipeline such that genes can be designed, synthesized and implemented for recombinant expression and protein crystallization.
Gene synthesis is a convenient method of obtaining sequence-verified cloned DNA, especially when the source biological material is not readily available (rare, lost or dangerous specimens), when codons need to be optimized for expression in a particular host or when the desired sequence is chimeric or designed de novo. In a gene to structure pipeline utilizing X-ray crystallography, customizing the synthesis of the open reading frame to rapidly express recombinant protein for crystallization becomes particularly useful in structure determination endeavors. This is particularly the case when specific residues within a certain site or entire regions of a protein need to be altered to change protein activity, stability or ability to crystallize.
Structural genomics consortia have exploited gene synthesis technology to produce difficult recombinant proteins for structural analysis . In most cases, if not all, synthetic genes are fabricated commercially for convenience and time efficiency. Presently, the affordability of synthetic oligonucleotides (the building blocks for gene synthesis) makes it possible for individual researchers to carry out their own gene synthesis projects in their home laboratories if gene synthesis procedures can be streamlined and performed without intricate manipulations. In addition, combining feasible gene synthesis techniques with quick cloning methods into recombinant expression systems without extensive enzymatic requirements would also greatly facilitate protein structure investigations.
In this study, we use a PCR-based gene synthesis technique coupled to in vivo homologous recombination to quickly construct genes for protein production without the use of any additional enzyme. The coding sequences of two proteins derived from hyperthermophilic microorganisms have been synthesized and transformed with a specially prepared expression vector into a bacterial expression system without DNA digestion or ligation. Recombinant proteins were overexpressed from the synthetic gene constructs, purified and used for protein crystallization. Using this method, the entire procedure from gene to crystal screening can be performed within 2 weeks. The procedure described here would especially benefit small traditional structural biology laboratories as the technique to synthesize DNA coding sequences for direct protein production requires no automation and can be executed without great cost and time consumption.
The nucleotide coding regions of a PAZ (Piwi/Argonaute/Zwille) domain and a deletion mutant (lacking the 5'-3' exonuclease domain) of a family A DNA polymerase (PolA) were selected for synthesis. The sequences of the synthesized genes were derived from the genetic material of two hyperthermophilic microorganisms isolated from mud samples collected at the Rainbow hydrothermal vent field on the North-Atlantic Ridge: the sulfur-reducing archaeon Thermococcus thioreducens  and an uncharacterized bacterium designated OGL-7B (unpublished). The latter was isolated as a single colony from a 90°C culture of an autoclaved mud sample and could not be cultured further. The PAZ domain and PolA nucleotide sequences, hereafter designated as paz and polA, were derived from Thermococcus thioreducens and OGL-7B respectively.
About 5 μg of pET3a plasmid (Novagen, Madison, USA) was double digested with NdeI and BamH1 restriction endonucleases (Promega, Madison, WI, USA) following the manufacturer's instructions. The digested plasmid was purified using the QIAquick PCR purification kit (Qiagen, Valencia, CA, USA), blunt-ended with Klenow fragment (Promega, Madison, WI, USA) according to the manufacturer's directions and further purified as described above. The resulting pET3a fragment was used for homologous recombination reactions without further modification.
Oligonucleotides for the synthesis of polA
Oligonucleotides for the synthesis of paz
E. coli strain DH5α was rendered competent using the rubidium chloride method . DNA mixtures were gently mixed with 50 μl competent cells. After 15 min incubation on ice, cells were heat-shocked at 42°C for 80 s. Thereafter, the transformed cells were diluted with 150 μl Luria-Bertani (LB) medium, incubated 1 hour at 37°C/250 RPM and spread on LB-agar plates containing 100 mg/L carbenicillin. Colonies were visible after 16 hours incubation at 37°C. Colonies were picked without screening and grown in 5–10 ml LB complemented with 100 mg/L carbenicillin at 37°C/250 RPM for 12–16 hours. Plasmids were purified using EZNA plasmid Miniprep kit II (Omega Bio-Tek, Doraville, GA, USA). Sequencing was performed by Functional Biosciences (Madison, WI, USA) using vector-specific sequencing primers flanking the cloning site as well as additional gene-specific primers when needed. Both strands of each gene were sequenced. Sequences were assembled and analyzed using the Staden package version 1.6.0 .
The two proteins were expressed and purified in exactly the same way unless indicated otherwise. Expression plasmids containing the error-free inserts were directly transformed into competent E. coli BL21(DE3). Competent cells were prepared by the rubidium chloride method and transformed with 10 ng of the recombinant plasmids as described previously. Transformed cells were then mixed with 150 μl LB medium, incubated 1 hour at 37°C/250 RPM and spread on LB-agar plates containing 100 mg/L carbenicillin. Overnight colonies were then used to inoculate 20 mL LB cultures containing 100 mg/L carbenicillin that were incubated for 8 hours at 37°C/250 RPM. The resulting cultures were further distributed into 2 × 2 L of LB medium containing the same type and amount of antibiotic. The cultures were allowed to grow at 37°C/250 RPM and the expression of the recombinant proteins was induced by the addition of isopropyl thio-β-D-galactoside (IPTG) at a final concentration of 0.5 mM when the optical density of the liquid culture at 600 nm reached 0.6.
After 6 hours, cells were harvested by centrifugation. All subsequent steps were performed at 4°C. Cells were resuspended in buffer A (50 mM TRIS pH 7.5, 50 mM NaCl, 1 mM EDTA for PolA and 50 mM BIS-TRIS pH 6.5 50 mM NaCl, 1 mM EDTA for PAZ) supplemented with 10 μg/mL deoxyribonuclease I and ribonuclease I (New England BioLabs, Ipswich, MA, USA) and disrupted by sonication (6 cycles of 45 pulses) using a Branson Sonifier 250 (VWR Scientific, West Chester, PA, USA). Cell debris were removed by centrifugation (12000 g, 20 min). The supernatant was heated for 30 min at 75°C and the precipitate was removed by further centrifugation. The supernatant was loaded into an ion exchange column (HiTrap Q sepharose for PolA and HiTrap SP for PAZ) (Amersham, USA) that had been pre-equilibrated with buffer A and eluted with a 0.05–1 M NaCl linear gradient in buffer B, using an Akta Explorer FPLC system (Amersham-Pharmacia, USA). Fractions containing the recombinant protein (showing a band of the expected size on a SDS-PAGE gel) were pooled, dialyzed against buffer A. The protein was concentrated to 2 mL using a Amicon Ultra centrifugal filter device (Millipore, USA), applied to a Sephacryl S-200 gel filtration column (Pharmacia, USA) pre-equilibrated with buffer A and eluted with the same buffer. Fractions corresponding to the principal peak were collected, pooled and concentrated for subsequent crystallization screening. The final concentration were 15 and 22 mg/mL for PolA and the PAZ domain respectively. Protein concentration was estimated using UV absorbance 280 nm and 260 nm and determined by the Bradford assay  using albumin as the protein standard.
Synthesized DNA products were analyzed by agarose gel electrophoresis as described by Sambrook & Russel . Purified recombinant proteins were evaluated by SDS-PAGE analysis according to the method of Laemmli  using 12% polyacrylamide pre-cast gels (Invitrogen, Carlsbad, CA, USA) in the presence of 0.1% SDS. SeeBlue Plus2 Pre-stained standards (Invitrogen, Carlsbad, CA, US) were use as standard protein markers in the range of 4–250 kDa.
Crystallization conditions for the purified proteins were screened with Crystal Screen I and II crystallization screening reagents (Hampton Research, Aliso Viejo, CA, USA) by sitting-drop vapor diffusion in IntelliPlates (Art Robbins, Sunnyvale, CA, USA). The final diffraction quality crystals were grown after approximately one week at room temperature from optimal conditions based around the initial conditions obtained from the commercial screen. The conditions for their growth will be reported in a separate publication with their structure determination.
Protein crystals obtained were visually observed through a polarized filter under a Nikon visible-light microscope and photographed with a Kodak Digital Science DC120 camera. The crystals were carefully soaked for 2 min in a cryopreservative solution containing 25% glycerol in precipitating reagent. Thereafter, the crystals were mounted onto a 20 micron diameter nylon cryoloop (Hampton Research, Aliso Viejo, CA) and directly flash frozen in liquid nitrogen. The cryogenically preserved crystals were mounted on a goniometer head cooled in a nitrogen stream at 100 K using an MSC X-Stream Cryogenic Crystal Cooler System (Molecular Structure Corporation, The Woodlands, Texas, USA). Diffraction data were collected using an MSC R-AXIS IV image plate detector with a crystal-to-detector distance of 250 mm. The X-rays were generated by a Rigaku rotating anode generator operated at 50 kV and 100 mA and focused with MSC OSMIC confocal mirrors. Images were collected at 1.0 degree oscillation angles with an exposure time of 5 min. These data were indexed using DENZO and reduced using SCALEPACK within the HKL2000 program package .
Several methods for assembling genes from synthetic oligonucleotides have been developed [11–15]. Our effort to include gene synthesis in a gene to structure pipeline for non-automated structural genomics  prompted us to expand current methods in gene assembly for recombinant protein production. Our method is a modification on the thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis  and is referred to as "Sequential TBIO (SeqTBIO)" because of the incremental and individual successive steps involved. The principle is illustrated in Figure 1. An initial DNA fragment made from 2 central oligonucleotides is extended bidirectionally, one oligonucleotide pair at a time. Major differences with the TBIO methods are the number of oligonucleotides per step (1 pair instead of 4 to 6), the number of cycles per step (only 7 compared to 25) and the absence of any gel purification step. Overall, despite a higher number of steps, our method is faster and more robust because the steps are simpler and require less manipulation.
To demonstrate the simplicity of our improved methodology for synthesizing DNA fragments of different sizes, the assembly of two novel nucleotide sequences, paz and polA, were performed. The protein encoded by paz is a 15.41 kDa PAZ (Piwi/Argonaute/Zwille) domain, a siRNA-binding domain of an Argonaute protein homologue. The polA gene codes for a deletion mutant of a family A DNA polymerase lacking the 5'-3' exonuclease domain, with a molecular weight of 67.70 kDa. The paz sequence was assembled in just 4 steps (Figure 1A). Each reaction resulted in a single band clearly showing an incremental increase in DNA fragment size. The assembled synthetic gene fragment contained a total of 449 base pairs and included vector homologous regions at its ends. Two clones were sequenced, one contained 2 point deletions and the other one was error-free and was subsequently used for protein expression.
Gene assembly results for polA are shown in Figure 1B. Each reaction involving the synthesis of polA resulted in a single band product. Because the yield began to decrease after about 20 reactions (reactions 20–22), the last reaction was repeated using more template (2 μl of the 21st reaction product instead of 0.9) and more cycles (15 instead of 7), which was enough to obtain a pure single DNA product (lane F). The assembled synthetic gene fragment contained 1826 bp including sequences at the 5'and 3' termini that were homologous to the expression vector sequence. After co-transformation with a linearized vector and plasmid isolation from cultures grown from individual colonies, plasmids isolated from 4 clones were sequenced. All contained the expected insert, and among the 4 clones, a total of 6 single-nucleotide mutations were detected, including 1 deletion, 3 transitions and 2 transversions. A clone with a single mutation (the deletion) was selected for error correction.
We used in vivo homologous recombination to efficiently insert assembly products into a propagating plasmid vector and to rapidly correct synthesis errors. Gene cloning mediated by RecA-independent homologous recombination in E. coli is well documented [17–19] although it has not become a mainstream technique despite its simplicity and efficiency. It is based on the ability of many E. coli strains (including the RecA deficient ones used in cloning) to perform in vivo intermolecular recombination between DNA fragments sharing homologous sequences at their ends. In our experiments, synthetic gene fragments can be quickly subcloned into a linearized target plasmid vector (Figure 2) without restriction digest, ligation or other enzymatic manipulation. Virtually 100% of the resulting clones contained the correct insert, eliminating the need for screening. In addition, synthetic DNA fragments produced with our method can be easily assembled further into larger constructs by in vivo homologous recombination of overlapping fragments. We have successfully synthesized a 3.6 kb and a 6 kb genes by assembling two 1.8 kb and three 2 kb fragments respectively (data not shown).
Synthetic genes inherently have errors derived mainly from inaccuracies in the oligonucleotide syntheses and to a lesser extent from the DNA polymerase-mediated assembly . The error rates observed in our assembly products were consistent with the 1 to 3 errors per kb reported by others , and imply that, especially for larger genes, prohibitively large numbers of sequencing reactions need to be performed in order to have a high probability to find a clone with the correct sequence . Several approaches have been proposed to decrease the error rate, in particular using enzymes involved in mismatch recognition on renatured assembly products [20–22]. However, these may be difficult to implement due to the cost and availability of such enzymes. Therefore, for small-scale gene synthesis projects, it is often simpler to correct errors through site-directed mutagenesis (SDM). Numerous SDM techniques have been described over the last two decades, many of which involve more than one PCR step, the use of additional enzymes or further complex manipulations [23, 24]. We applied a SDM method based on in vivo homologous recombination using a single PCR step, which is both simple and efficient [18, 25]. In its simplest form, competent cells are directly transformed with PCR products from a single amplification step generating overlapping fragments.
The error correction strategy using in vivo homologous recombination is illustrated in Figure 2. We have approached it in two ways. First, primers are designed to produce corrected fragments of the gene assembly product (Figure 2A). In this case, two, three and four pairs of primers (F1-R1, F2-R2, F3-R3 and F4-R4) are required to correct one, two and three error sites respectively in separate reactions. The resulting PCR products are mixed with the linearized plasmid vector and used to transform competent cells in which in vivo homologous recombination is allowed to occur. In our hands, as many as 4 PCR amplified corrected fragments recombined accurately with the recipient vector and generated error corrected products.
The second approach involves the amplification of DNA fragments that include the plasmid vector (Figure 2, panels B-D). Typically, up to 3 point mutations are corrected at one time. In such a case, two correcting primers, reverse-complement of each other, are designed at each mutation site, with the correcting nucleotide being at the center of each primer. DNA fragments are amplified by PCR using primer pairs F1-R1, F2-R2 and F3-R3 respectively in 3 separate reactions. In the case of 2 point mutations, two pairs of primers are similarly used such that there will be only 2 separate reactions. When a single synthesis error needs to be corrected, a non-mutagenic primer set corresponding to a sequence in the vector backbone is used in addition to the correcting primer set such that 2 fragments are generated (as if 2 corrections were being made) in order to avoid using mutually annealing primers in a single reaction.
In correcting the gene synthesis error of polA that had a point deletion, the two strategies were pursued in parallel. First, 2 overlapping fragments of the same plasmid were amplified (illustrated in Figure 2, panel B), each with a primer correcting the deletion and a primer corresponding to a sequence in the vector backbone. The primers in each fragment were reverse complement of the primers in the other fragment. In the second approach, only the gene synthesis product was amplified from the plasmid template in 2 fragments (each fragment being amplified with a correcting primer and a terminal oligonucleotide used for the original gene synthesis). In this case, the amplification products were mixed with the linearized vector allowing in vivo recombination to occur between the 3 fragments (illustrated in Figure 2, panel A). In both cases, 40 pg/μl of plasmid template was used in the correcting PCR mixture. Between 10 and 100 colonies were obtained for each transformation.
A random selection of colonies was analyzed for successful recombinant inserts. In both correction procedures used, the correction efficiency was 50% in which half the clones analyzed contained completely accurate sequences of the recombinant insert. A corrected clone from the first approach was selected for protein expression to demonstrate the integrity of the plasmid in the face of possible site mutations in the vector sequence.
An obvious concern when plasmids are used as PCR templates is the risk that they compete with the desired recombination products after transformation. Even if amplified linear DNA is several orders of magnitude more concentrated than the circular plasmid template in the PCR products, homologous recombination being a rare event, even a modest amount of circular plasmid template may result in the unwanted presence of non-recombined clones among the colonies after transformation. Several strategies have been proposed to address this problem, including gel purification of the amplified DNA fragment, linearization of the plasmid template by restriction enzyme digestion prior to PCR  and treatment of PCR products with DpnI which cleaves methylated DNA . In our effort to minimize the number of steps and to simplify the method, we attempted to prevent plasmid carry-over by diluting the plasmid template in the correcting PCR. In the case of polA, this strategy succeeded but the dilution level was not optimal since only half of the clones were recombination products.
When applying the same error correction technique to other genes, we found that further diluting the template to 3 pg/μl or less in the correcting PCR consistently resulted in 100% correction efficiency, effectively decreasing template carry-over to negligible levels. However, there were unexpected consequences as new mutations appeared in otherwise corrected clones. Despite using one of the highest-fidelity DNA polymerases commercially available, unintended mutations appeared when plasmid templates were diluted to 10 pg/μl or less. The mutation rate seemed to correlate with the level of template dilution, with over 1 mutation per kb when the template concentration was under 1 pg/μl. All new mutations involved a single nucleotide. A compilation of 79 mutations detected after sequencing 94 kb of corrected clones showed transitions to be prevalent (59% of all mutations observed) with an equal amount of type 1 (A to G and T to C), and type 2 (G to A and C to T), followed by deletions (28%), transversions (9%) and insertions (4%). Note that in all cases where new mutations were observed the template concentration was significantly lower than the DNA polymerase manufacturer's recommended 100 to 600 pg/μl of plasmid template input. In order to obtain reproducible 100% error correction while avoiding additional mutations, we found it necessary to limit the template dilution in the correcting PCR to the levels recommended by the enzyme's manufacturer and therefore to include a template removal step before transformation.
Our observations of increased mutation rates when using highly diluted templates in PCR are consistent with a previous report that low copy number template can decrease the fidelity of both Taq and Pfu DNA polymerases . To our knowledge, no explanation was ever proposed to account for such a phenomenon. However, it can have serious implications for PCR methods involving samples with limited availability such as ancient DNA, forensics or pre-implantation genetic diagnosis. Further studies are needed to confirm whether this phenomenon affects PCR in general and how the sensitivity to template concentration varies among different DNA polymerases or with varying reaction conditions.
Error corrections have been the limiting factor to completing accurate gene synthesis. We have corrected sequence errors using site-directed mutagenesis coupled with homologous recombination [18, 25] because of its simplicity and efficiency. Two tactics have been considered and both offer high-speed repair and subcloning independent of restriction site availability and ligation reactions. One method involves the amplification of an entire circular plasmid by PCR using mutagenic primers with overlapping sequences, while the other approach utilizes PCR amplification with mutagenic primers against the assembled synthetic gene fragment only. Even though we have demonstrated successful protein expression with corrected constructs using the former, we have observed higher success in subsequent protein expression trials with the latter approach. This is because the full-length amplification of the plasmid can give rise to second-site mutations in the vector sequence. Consequently, the expression plasmid may be rendered dysfunctional in replication, transcription or translation diminishing the occurrence of transformed cells that would be able to overexpress the targeted gene product. However, most of the time, the error corrected inserts can be subcloned into another vector. In the case of correcting errors by using mutagenic primers to amplify the assembled synthetic gene fragment only, there is minimal concern in altering the expression vector sequence since no part of the plasmid is amplified. As a result, the plasmid can be used for protein expression with increased confidence that possible failure does not result from alterations in the vector sequence.
Coupling a more reliable DNA synthesis approach with an efficient means of error corrections without any additional purification or enzymatic requirements provides a rapid means to clone and express proteins when the source of genomic or complementary DNA is limited or not available at all. This is particularly useful in crystallographic studies where altering specific amino acid sites or regions is required for overexpression or structural packing. Protein crystallization is often very sensitive to inter- and intramolecular packing where, for example, producing optimal ionic network interfaces through the apposition of extensive hydrophobic surfaces  or even constructing protein chimeras determine the successful outcome of a crystallographic structure . In addition, strategic incorporation of sulphur [32, 33] containing residues or methionine for selenomethionine replacement would be very useful in constructing protein molecules with intrinsic atoms useful for crystallographic phasing (i.e. single-wavelength-anomalous diffraction) .
The paz and polA genes are shown here as examples on how we have used the gene synthesis technique to produce novel proteins for protein crystallization. When the natural coding sequences of both proteins were cloned and prepared for recombinant protein expression in E. coli, all attempts to produce proteins failed. Both sequences showed codon usage that significantly diverged from that of E. coli. Recombinant expression could only be obtained after codons were optimized.
In our pipeline, large scale expression routinely involves two chromatographic steps executed after an initial heat selection. This includes separation by ion exchange and size exclusion chromatography and the proteins are usually screened immediately for crystallization. In the case of PAZ and PolA, crystals were obtained in the initial screening conditions and subjected to preliminary X-ray analysis. General crystallization conditions can be optimized in many ways and have been discussed previously [16, 35, 36]. If proteins crystallize poorly or not at all after exhaustive searches in crystallization conditions, then changing the protein molecule itself would be reasonable. Molecular changes can be made quite readily at the synthesis level where strategic amino acid residues or domains can be rearranged at will to optimize molecular packing. One interesting possibility, although untried, is to attempt expressing the proteins in incremental fragments that corresponds to the inside-out direction of synthesis of the recombinant open reading frame. Since gene synthesis can be incremental, each assembly cycle can be subcloned and tried for protein expression and crystallization. This possible approach would be of great interest in examining the truncation effects of the N- and C-termini of a protein on molecular stability, packing and ultimately crystallization. Other variations can involve domain switching or elimination and symmetry imposition. There are obviously countless other possibilities by which incremental changes of protein fragments may give rise to novel assemblies. The gene synthesis technique reported here integrates well into a gene to crystal pipeline, and its approach is applicable to small structural genomics projects as well as protein engineering studies.
The SeqTBIO method of gene synthesis, associated with in vivo homologous recombination-mediated cloning and error correction, was shown to integrate well in a gene to structure pipeline. As illustrated by the examples of the paz and polA genes, gene assembly was simple, efficient, and did not require any specific reagent or enzyme other than those used in PCR, nor any complex manipulation. Cloning and error correction were straightforward, requiring nothing more than transforming competent cells with amplified gene fragments. The described methodology to generate cloned genes of any sequence for protein expression and further structural studies does not require any expensive equipment or particular technical skill and can therefore benefit laboratories with limited resources.
The development of a mini-pipeline structural genomics was supported in part by NSF STTR-0611274 and NSF-EPSCoR (EPS-0447675).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.