A new method to customize protein expression vectors for fast, efficient and background free parallel cloning

Background Expression and purification of correctly folded proteins typically require screening of different parameters such as protein variants, solubility enhancing tags or expression hosts. Parallel vector series that cover all variations are available, but not without compromise. We have established a fast, efficient and absolutely background free cloning approach that can be applied to any selected vector. Results Here we describe a method to tailor selected expression vectors for parallel Sequence and Ligation Independent Cloning. SLIC cloning enables precise and sequence independent engineering and is based on joining vector and insert with 15–25 bp homologies on both DNA ends by homologous recombination. We modified expression vectors based on pET, pFastBac and pTT backbones for parallel PCR-based cloning and screening in E.coli, insect cells and HEK293E cells, respectively. We introduced the toxic ccdB gene under control of a strong constitutive promoter for counterselection of insert less vector. In contrast to DpnI treatment commonly used to reduce vector background, ccdB used in our vector series is 100% efficient in killing parental vector carrying cells and reduces vector background to zero. In addition, the 3’ end of ccdB functions as a primer binding site common to all vectors. The second shared primer binding site is provided by a HRV 3C protease cleavage site located downstream of purification and solubility enhancing tags for tag removal. We have so far generated more than 30 different parallel expression vectors, and successfully cloned and expressed more than 250 genes with this vector series. There is no size restriction for gene insertion, clone efficiency is > 95% with clone numbers up to 200. The procedure is simple, fast, efficient and cost-effective. All expression vectors showed efficient expression of eGFP and different target proteins requested to be produced and purified at our Core Facility services. Conclusion This new expression vector series allows efficient and cost-effective parallel cloning and thus screening of different protein constructs, tags and expression hosts.


Background
As central Core Facility Labs at the Max-Planck Institute of Biochemistry and the EMBL we provide in-house services for recombinant protein production. The proteins we are asked to produce are from various sources and protein families and are used for crystallization, immunization, biochemical, biophysical or biological studies. We perfom indepth protein analysis to ensure that delivered proteins are properly folded. However, in many cases, this is a fairly challenging task to achieve, despite the many options that can improve proper protein folding such as construct design, solubility enhancing fusion tags, expression conditions, expression hosts and improved protein purification protocols. The constant challenge is to identify a successful combination of parameters with a minimum of resources. Apart from unbiased HTP approaches [1] or targeted selection [2,3], it still remains a time consuming trial and error process in most non-automated protein labs. We had initially focused our screening efforts on constructs and solubility tags in E. coli as a first choice.
Eukaryotic hosts were used if suggested by the literature or previous experience, or in case E. coli screening had failed. As this happened frequently we decided to implement parallel testing of constructs, solubility tags and expression hosts altogether. In order to handle all these different expression constructs, an efficient parallel cloning method was required. In the past few years, a number of powerful combinatorial cloning methods have been introduced. Gateway technology (Life Technologies) opened the area of combinatorial cloning more than a decade ago. More recent additions to the list are type II restriction enzymes (Stargate, IBA, Germany), Golden Gate Shuffling [4], RF cloning [5] and Sequence and Ligation Independent Cloning SLIC [6]. For recombinant protein expression, several commercial (Novagen, IBA, Life Technologies) and non-commercial [7] parallel vector series are available. However, we had already established different expression systems with their respective vectors: The E.coli pETM vector series have different tags followed by a protease cleavage site and are based on the same pET backbone [8]. Differences in expression levels are exclusively based on different tags, not on backbones differences such as copy number, spacer sequences or others. Expression uses the powerful T7 promoter system at low vector copy number and plasmids confer Kanamycin resistence instead of Ampicillin, which is crucial for plasmid stability [9], particularly in high cell density fermentations. Transient transfection of mammalian cells is another efficient and fast method for protein production. The HEK293 (human embryonic kidney) cell line is widely used due to high transfection efficiency and suspension growth in serumfree media. A variety of HEK293 cells are currently available with significant differences in productivity [10]. The Epstein-Barr virus nuclear antigen 1 (EBNA1) in HEK293E cell line interacts with oriP on pTT vectors which increases plasmid persistence and protein expression levels [11]. Therefore we preferred to customize pETM, pFastBac and pTT vectors for parallel cloning rather than adapt to a new system. The method must be sequence and restriction-site independent to allow for incorporation of any DNA fragment into any of the vectors. It must be directional and precise and most important, it must be fast, simple, efficient and cost-effective. SLIC cloning enables sequence independent, precise cloning with minimal or no changes in the amino acid sequence of the target protein and is based on homologous recombination of vector and insert with 15-25 bp homologies on both DNA ends. The reaction is enhanced using either T4 DNA polymerase, recA protein or incomplete PCR products [6]. In order to adapt vectors for SLIC, they need to share a common stretch of nucleotides at both ends of the linearized plasmid. We have chosen the HRV 3C protease [12] cleavage site and the 3' end of the toxic ccdB gene to serve as primer binding sites common to all of the new vectors ( Figure 1). The HRV 3C recognition site is located downstream of the N-terminal purification or solubility enhancing tag and can be used for tag removal. The CcdB protein inhibits bacterial growth by selective inhibition of E. coli DNA gyrase and can be neutralized by the antitoxin ccdA. The ccdB technology was developed by Delphi Genetics [13] and is licenced for use in Gateway vectors as well (Life Technologies). In the SLIC strategy presented here, ccdB introduced into the vector series was designed for strong constitutive expression in order to suppress growth in non-resistant cells at 100% efficiency. The vector is used as a PCR template for the amplification of the linear vector fragment, where ccdB is deleted. The ccdB gene on the template thus prevents the carry-over of the original vector during purification and SLIC reaction by preventing growth of colonies not containing the gene of interest. We developed pET, pFastBac and pTT parallel cloning vectors, named pCoofy 1-x and present protein expression data in each of the respective host organisms.  Figure 1 Principle of parallel SLIC cloning with negative ccdB selection. The vector is PCR linearized with LP1 forward and LP2 reverse primer. The LP1 primer corresponds to PreScission protease site (3C) for tag removal. The LP2 primer is either located at the C-terminus of ccdB or corresponds to a C-terminal tag. In both cases the ccdB gene is deleted upon PCR amplification thereby allowing counterselection of parental empty vector in ccdB sensitive cells. The Gene of Interest (GOI) is PCR amplified with primers composed of 5' and 3' gene specific sequences plus 15 bp -25 bp extensions complementary to LP1 and LP2 vector primers, respectively.

Results and discussion
Vector design and cloning strategy In order to drive strong constitutive ccdB expression from pCoofy vectors, we used the promoter of the major outer membrane lipoprotein OmpA, which is one of the strongest promoters in E. coli [14]. We inserted the respective llp5 promoter variant and a Shine-Dalgarno sequence upstream of the ccdB coding sequence (Figure 2A) to ensure translation. pPCRScript-LPP5-ccdB ( Figure 2B) and all pCoofy ccdB derivatives show 100% killing efficiency when transformed into non-resistant cells. Occasionally, we observed the occurrence of ccdB inactivation during plasmid propagation in ccdB Survival ™ cells (Life Technologies) under high selective pressure such as plasmids with high copy number. Therefore killing activity has to be verified for every single batch of vector DNA. However, since we use the vectors only as templates for PCR linearization the amount of DNA used for a single PCR is very low and usually sufficient for multiple SLIC reactions. Typically, 1 ng plasmid DNA is needed for a single SLIC reaction. We first cloned Llp5-ccdB into pETM14, pETM22, pETM33 and pETM44 to generate pCoofy1, 2, 3 and 4 ( Figure 3A and Table 1). The parallel SLIC cloning procedure was established using eGFP as gene of interest according to the strategy illustrated in Figure 1A. pCoofy vectors were PCR linearized with 3C -LP1 forward and ccdB -LP2 reverse primer ( Table 2). The LP1 primer corresponds to the HRV 3C protease site , the LP2 primer is located at the C-terminus of ccdB in order to delete the gene upon PCR amplification. eGFP was PCR amplified with primers composed of gene specific sequences plus 20 bp and 25 bp extensions complementary to LP1 and LP2 vector primers, respectively ( Figure 3B).
When the SLIC reaction was carried out with insert and vector at a molar ratio of 1:3 without any treatment prior to transformation into chemocompetent OmniMAX ™ 2 T1 R cells, cloning efficiency was below 70%. Addition of recA raised overall cloning efficiency to > 95%. T4 DNA Polymerase treatment of vector and insert [6] was equally efficient, but due to simplicity we continued with the recA protocol. We tested several other variations to the basic protocol, however none of these further improved cloning efficiency: number of PCR cycles, PCR without extension step, extended LP1 and LP2 primer length for vector and insert PCR amplification, amount of vector and insert, molar ratio of vector and insert, 5 min 95°C denaturation of vector and insert mix followed by slow renaturation at 22°(data not shown). E.coli cells used for transformation may have an impact on quantity and quality of recombination events and should be tested first. At the Max-Planck Institute we use chemocompetent OmniMAX ™ 2 T1 R cells with a typical transformation efficiency of 10 7 /μg pUC plasmid DNA.

Vector list and cloning statistics
The list of E.coli pCoofy vectors was extended by modifying additional pETM vectors or by introducing His 10 , OneStrep, S or Halo tags from templates listed in Table 1. All N-terminal tags are followed by the HRV 3C recognition site Leu-Phe-Gln/Gly-Pro. Specific cleavage occurs between Gln and Gly, with Gly-Pro remaining at the N terminus of the target protein. In order to express proteins that have to retain their native N-terminus after tag removal we generated ccdB versions of pET28M-Sumo1 and pET28M-Sumo3 vectors. The SUMO (Small Ubiquitinlike Modifier) tag is recognized and removed by SUMO protease in a structure specific manner to yield the target protein with its native N terminus [15]. Cloning a target gene into pCoofy5 and pCoofy6 requires the corresponding Sumo-LP1 primer for vector and insert PCR amplification ( Table 2). We also generated E.coli vectors with either a C-terminal His 10 or OneStrep tag (Table 1), which require the corresponding LP2 vector and insert primer for SLIC cloning ( Figure 1, Table 2). Moreover, constructs without any N-terminal tag can be generated using LP1 tagless primers for the appropriate backbone. N-tagless primer were validated and used for pET and  pFastBac backbones so far ( Table 2). In order to further increase C-tag variations but at the same time reduce the number of vectors to be generated we designed a 2 nd generation ccdB cassette ( Figure 3C). Llp5-ccdB is followed by a row of C-terminal tags all separated by a stop codon.
Depending on the LP2 primer used for vector and insert PCR linearization, either no tag, the His 10 , S, OneStrep, CBP, HPC4 or AE54CPD54 self cleaving tag is fused to the C-terminus of the protein. Except for the AE54CPD54 self cleaving tag, C-terminal tags lack a protease cleavage site  Figure 3 Representative maps of parallel ccdB vectors for protein expression in E. coli, Baculovirus and HEK293E. For each of these hosts, one example of the pET (E. coli) (A), pFastBac (Baculovirus) (E) and pTT (HEK293E) (D) ccdB vector series is shown. All three backbones share common LP1 (3C) and LP2 (ccdB) primer binding sites for parallel cloning. (B) Primer design is illustrated for the universal PreScission 3C-ccdB primer pair of the pCoofy vector series. Gene specific primer sequences are fused to sequence overhangs of 20-24 nucleotides which are complementary to the corresponding vector amplification primer (see Table 2). (C) 2 nd generation ccdB cassette including C-tags The ccdB cassette as shown in Figure 2A starting at TTGACA (−35 region) was fused to a row of C-terminal tags each separated by a stop codon. LP1 (3C) sequence was added upstream of the ccdB coding sequence. Restriction sites at both ends were also added for subsequent cloning into different vector backbones. The complete cassette was synthesized by GeneArt (now part of Life Technologies) with concomitant codon optimization of the tags for eukaryotic expression.
and cannot be removed. At this time we have cloned this ccdB -C-tag cassette into pTT5 ( Figure 3D) and validated eGFP and target gene expression in mammalian cells. pCoofy derivatives of pFastBac1 are currently available with His 6 , His 6 GST and His 6 MBP N-terminal tags ( Figure 3E, Table 1). We have effectively cloned more than 250 inserts into all different vectors of the pCoofy series. For all constructs, the DNA sequence of the translated gene fusion was controlled. We did not sequence the vector backbone of recombinant constructs, as we have never observed any compromised vector function. Insert sizes ranged between 150 and 3939 bp, with a majority in the range of 500-1000 bp. The number of clones per SLIC reaction varied with an average of about 20 throughout the entire distribution of insert sizes. As we never observed any background, we were not concerned if clone numbers were low, as the clone was correct in almost all cases ( Figure 4).

Expression in E.coli
Prior to use for requested target proteins, every new vector was validated with eGFP for cloning and small scale expression in the respective host. In case the expression level was unexpectedly low, we removed the vector from the list. For example a His 10 Trx-eGFP construct was expressed at about 20% of total protein in E.coli total cell lysate. When the double tag was switched to TrxHis 10 -eGFP the expression level increased to more than 70% (data not shown). Figure 5A shows a comparison of expression level and solubility of eGFP fused to several purification and solubility enhancing tags in E. coli. Trx, MBP and NusA protein fusions show the highest solubility as reported previously [8] and also the highest expression level at up to 80% of total cellular protein. Most interesting, His 6 GST-eGFP expressed at a high level but at low solubility. This result corroborates our previous observation, that His 6 GST expressed from the original pETM33 vector was insoluble (data not shown). Most of the E.coli expression data for requested target proteins at our Protein Production Service were collected for the first pCoofy vectors 1-4 corresponding to N-terminal His 6 , His 6 Trx, His 6 GST and His 6 MBP tags. In agreement with the eGFP expression data, MBP had a major impact on protein solubility ( Figure 5B). This is also exemplified by E.coli expression of two Pil protein mutants in pCoofy1, 2, 3, and 4 ( Figure 5C): expression levels range between 50% -and 80%, of target protein in E.coli total cell lysate with lower or no expression of the His 6 GST fusion protein. In the course of the project, only the His 6 MBP-Pil fusion proteins were soluble also after tag removal with HRV 3C protease (data not shown).

Expression in insect cells
pCoofy vectors allow for parallel screening in E.coli and insect and/or mammalian cells which has increased protein production throughput in our facilities substantially. Here we show two examples, Vasp and ODC, where parallel cloning allowed us to switch from the E.coli expression host to the Baculovirus expression system easily without much delay. In the case of Topoisomerase 1, Baculovirus expression of two parallel constructs improved project progress.
Expression of GST-Vasp in E.coli was described previously [16]. When we expressed GST-Vasp in E.coli we observed partial proteolytical degradation and also coaggregation of degradation products with full-length protein. Instead of investing time in optimizing bacterial expression, we cloned Vasp into pFastBac derivatives pCoofy28 (His 6 GST) and pCoofy29 (His 6 MBP) and expressed both constructs in SF9 cells without any signs of degradation ( Figure 6A). Purified full-length His 6 GST-Vasp was shown to be biologically active (data not shown). Ornithin Decarboxylase (ODC) was properly folded when expressed in E.coli , but for coexpression purposes it had to be purified from insect  cells. We therefore cloned the ODC gene into pCoofy27 (His 6 ) and expressed the ODC protein in High Five cells ( Figure 6B). Again, SLIC cloning enabled rapid change of expression host. Expression of GST-Top1 in insect cells was described previously [17]. We cloned Top1 in pCoofy28 (His 6 GST) and pCoofy29 (His 6 MBP) in parallel and tested expression in High Five cells. His 6 MBP-Top1 showed much higher expression level than His 6 GST-Top1 and was purified to homogeneity in enzymatically active form. (data not shown) ( Figure 6C).

Expression in HEK293E cells
Protein purification from HEK293E cells in our hands is not very efficient using standard immobilized metal affinity purification. We therfore have introduced alternative C-terminal purification tags into pTT5 ( Figure 3C). The ccdB -C -tag cassette into pTT5 increases plasmid size by 1350 bp. In order to analyze if this has an impact on transient gene expression in HEK293E cells, we compared expression levels of both intracellular eGFP and secreted CD40 ligand protein [18]. Both proteins were expressed from the original pTT vectors [2] and their respective pCoofy derivatives. Transient transfection of both genes show comparable levels of eGFP in the total cell lysate and CD40 ligand in the culture supernatant when either expressed from pTT or pCoofy ( Figure 7A).
In order to test these alternative purification tags we fused them to eGFP, transiently expressed in HEK293E cells and purified with the respective affinity resin, except AE54CPD54, which is specifically activated by inositol hexakisphosphate (InsP6) present in eukaryotic cells [19]. Comparison of C-terminal S-tag, His 10 , HPC, CBP and OneStrep showed best expression levels for both S and His 10 tag. Protein yield was lowest for S-tag, HPC and CBP. Best yield and purity were obtained for eGFP-One Strep and eGFP-His 10 when washed stringently with 50mM and 80mM imidazole ( Figure 7B). In summary, we have shown effective protein expression and purification from pCoofy40 vectors that can now be included in our parallel cloning strategy.

Conclusions
We have developed a method that allow one to tailor any given expression vector for efficient, fast, robust and cost-effective parallel cloning. High cloning efficiency is guaranteed via strong constitutive ccdB expression that, in contrast to DpnI digestion, is 100% efficient in counterselection of parental insert-less vector. The procedure is very robust and has been easily implemented in research groups in-house or externally. We have generated more than 30 parallel vectors for expression in bacteria, insect and HEK293 cells with different purification and solubility enhancing tags that we consider to be helpful in our workflow. This list of pCoofys has fundamentally increased our throughput and success rate in protein purification. We are constantly expanding the list of vectors and have also integrated the ccdB-C-tag cassette into Baculovirus, Pichia pastoris and Hansenula polymorpha vectors which still need to be validated.
With the strategy presented here it is straightforward to assemble any tag combination of interest for any selected application. Moreover, with the use of SLIC, as many as five inserts can be assembled in one reaction simultaneously with great efficiency [20]. Thus a modular combination of any vector element such as purification tag, signal sequences, antibiotic resistance etc. would be possible. The Llp-ccdB counterselection gene presented in this work could also enhance cloning efficiency of other cloning methods as RF or others. Proteins are classified as soluble, insoluble or not expressed. "Soluble" proteins could be enriched by IMAC and were further purified with or without tag. Proteins classified as "insoluble" were not soluble in standard IMAC buffers (buffer A). These constructs were either subject to refolding, buffer screens or discontinued. Proteins classified as "not expressed" cover constructs, that were expressed at low or even undetectable level in Western Blot analysis and were thus discontinued. (C) Pil1 (UniProt: P53252). SAQ and SFK mutants were SLIC cloned into pCoofy1 (His 6 ), pCoofy2 (His 6 Trx), pCoofy3 (His 6 GST) and pCoofy4 (His 6 MBP) using standard LP1 (3C) and LP2 (ccdB) primers and expressed in BL21 Rosetta (DE3) at 30°C. oD3 samples (0,1 ml culture of oD 600 of 3 is lyzed in 50μl sample buffer) were loaded on Agilent BioAnalyzer P80 Chips. Expression levels indicated are based on relative peak quantification.

Vector construction
Molecular biology methods were based on standard protocols. E.coli chemocompetent ccdB survival cells and OmniMAX ™ 2 T1 R cells (Life Technologies, Darmstadt, Germany) were used for propagating ccdB plasmids and for transforming cloning reactions, respectively. PCR primers were ordered at Metabion (Martinsried, Germany). PCR was performed in 50 μl reaction mixes using high fidelity Phusion polymerase (NEB, Frankfurt, Germany). PCR products were analyzed on agarose gels and purified with High Pure PCR cleanup kit (Roche, Mannheim, Germany). Plasmid DNA was prepared using NucleoBond W or NucleoSpin W Plasmid Kits (Macherey Nagel, Düren, Germany). pETM vectors were provided by the EMBL [8], pTT by Yves Durocher [11] and pFastBac was purchased at Life Technologies. Synthetic ccdB DNA containing the promoter, Shine-Dalgarno and the coding sequence of ccdB was synthesized by Sloning BioTechnology (now part of Morphosys). The 2 nd generation cassette was synthesized by GeneArt W (Regensburg, Germany, now Life Technologies). pCoofy E.coli expression vectors were generated by ligating the Llp5-ccdB HindIII restriction enzyme fragment from pPCRSript-ccdB into HindIII linearized pETM vectors. Additional tags, that were not present in the EMBL vector series, were added to the list: His6 was extended to His10 with the use of primer extensions; the OneStrep tag was PCR amplified from pPSG-IBA103-eGFP (IBA, Göttingen, Germany), the S tag was amplified from pET29 (Novagen, Darmstadt, Germany) and the Halo tag was amplified from pFN18a (Promega, Mannheim, Germany) template DNA. The pCoofy transient insect expression vector was derived from pIEX1. Vector was digested with XcmI and NotI and ligated to llp5-ccdB, that was PCR linearized with XcmI / NotI primer extensions. The pCoofy Baculovirus expression vectors were derived from pFastBac1. The sequence spanning N tag -3C -Llp5-ccdB was PCR amplified using pCoofy1, pCoofy3 and pCoofy4 as template DNA to generate pCoofy27, pCoofy28, pCoofy29, respectively. PCR fragments were extended by RsrII /XhoI restriction sites and ligated into pFastBac1 linearized with RsrII /XhoI. The pCoofy HEK expression vector was derived from pTT5. The vector was linearized with EcoRI / NotI and ligated to the ccdB -C -tag cassette, which was PCR amplified with EcoRI / NotI primer extensions. Prior to use, the integrity of all vectors was verified by DNA sequencing. ccdB toxicity is a prerequisite for efficient counterselection and was verified for each new vector and for each individual vector preparation by transformation into ccdB non-resistant OmniMAX ™ 2 T1 R cells. Functionality in cloning and expression was controlled for each new vector by eGFP SLIC cloning and small scale test expression in the appropriate host.

SLIC cloning
All vectors were PCR linearized with their corresponding LP1 and LP2 primer combination, purified and stored at −20°C ready-to-use. Briefly, a 50 μl reaction mix containing 25 ng vector DNA, 50 pmol of each primer 0,4mM dNTP Mix, 1 Unit Phusion Polymerase and 1x

Protein expression in insect cells
HighFive and SF9 suspension cultures were grown in ExCell405 and ExCell420 medium, respectively (Sigma, Munich, Germany) at 27°C in 50 mm Unitron shakers (Infors, Bottmingen, Switzerland). 2 ml test expression cultures were shaken in 25 ml polystyrene screw cap tubes (Sarstedt, Nümbrecht, Germany) at 120 rpm. Cell viability and cell size were monitored on a Vi-CELL W instrument (Beckman Coulter, Krefeld, Germany). Baculovirus expression was performed according to the Bac-to-Bac W protocol (Life Technologies, Darmstadt, Germany). Bacmid transfected SF9 cells were typically harvested after 4-5 days, at maximum cell size and onset of cell lysis. The titer of this first virus stock was determined with the SF9 easy titer cell line [22]. Virus was either amplified in 2 subsequent steps to generate P1 and P2 virus stock or used to generate Baculo Infected Insect Cells (BIIC) as described previously [23]. For test expression, High Five and SF9 cells were infected at 1 × 10 6 cells / ml with virus stock or BIIC at different dilutions, typically in the range of 1:1000 -1:10.000 and harvested after different time points. Viability and cell size was recorded for every expression culture.