Combined protein construct and synthetic gene engineering for heterologous protein expression and crystallization using Gene Composer
© Raymond et al; licensee BioMed Central Ltd. 2009
Received: 15 October 2008
Accepted: 21 April 2009
Published: 21 April 2009
With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript.
In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38α), viral polymerase (HCV NS5B), and bacterial structural protein (FtsZ) were expressed in both E. coli and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias.
The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.
The gene to structure endeavor faces intrinsic challenges at several steps in the experimental pathway. The primary challenge is expressing the great quantities of protein required to support structural studies, and this challenge can be magnified by the use and necessity of heterologous expression systems. While many research pursuits begin with protein purification, the extent of material required is a burden somewhat unique to protein crystallography. A second major problem of protein crystallography is identifying a suitable protein construct, given that small changes in the protein can have profound effects on crystallization [1, 2]. These two problems are related since small changes in the protein construct can also have profound effects on the level of protein expression, solubility, and diffraction quality of protein crystals obtained. One recent report notes that construct engineering can double the number of targets expressing soluble protein in a heterologous system, and can provide a fourfold increase in the likelihood of obtaining well-diffracting crystals . These aggregate improvements can enable extensive crystallization trials.
It is well known that codon utilization is highly biased and varies considerably among various organisms and codon usage is considered a key determinant of eventual heterologous protein expression . Codon bias in genomes can be defined as the unequal usage of synonymous codons in known or predicted open reading frames (ORFs). Codon-usage patterns are related to the relative abundance of tRNA isoacceptors, and genes encoding highly expressed proteins show differences in their codon usage frequencies . Extensive research by Karlin and Sharp have demonstrated that highly expressed genes in E. coli and other bacteria have a significant bias towards certain subsets of codons [5, 6]. Moreover, bacteria and unicellular eukaryotic organisms seem to have codon biases that are highly correlated with measured isoaccepting tRNA levels [7–9]. In general, where it has been measured, higher eukaryotes also appear to have codon biases that complement the abundance of isoaccepting tRNAs [10, 11].
It is generally thought that codon usage alters peptide elongation rates, however codon-usage patterns can also improve the fidelity and kinetic efficiency of translation . Since the level of isoaccepting tRNAs in E. coli are correlated with codon bias, it is likely that the presence of rare codons has a negative impact on expression of heterologous genes. It is also likely that the kinetics of translation are improved by more closely matching the codon usage of a recombinant gene product to the usage of the expression host [3, 13–17]. Apart from the nonrandom use of codons, it has also been noted that codon/anticodon recognition is influenced by sequences outside the codon itself, a phenomenon termed codon context [18–20]. There is, for example, an occurrence bias between specific adjacent codon pairs and these biases are different for highly expressed versus low expressed proteins in E. coli [21–23]. Clearly, there are still a number of mysteries in the subject of codon bias and codon context. Taken together these results also emphasize how difficult it is to create "codon optimized" genes for expressing proteins since the rules are complex and not yet fully defined.
As noted recently, more empirical evidence about protein expression from engineered genes is still needed, and efforts are being made to database the outcomes of gene engineering [24, 25]. There remain a variety of other factors that should be considered when engineering synthetic genes and the details of "optimal" gene design continue to evolve . For example, the synthetic gene sequence should not only contain the proper codon bias and codon context, but should also not result in mRNA secondary structures that may inhibit translation . Additional considerations such as unusual sequence repeats, and cryptic regulatory sequences (e.g. Shine-Dalgarno, splice sites, RNase cleavage sites, etc.) can impact the fidelity and yield of protein expression .
Large scale projects in genomic sequencing and protein structure determination are producing enormous quantities of data on the relationships between 2D gene sequence and 3D protein structure. Moreover, such efforts are providing experimental data on success factors at every step in the gene to structure research endeavor. This wealth of information should be used in a feedback cycle to facilitate the design and production of genes and protein constructs that are engineered for the successful production of functional protein samples for structural studies. Fundamentally, this goal represents a bioinformatics software challenge.
To better address this issue in the context of synthetic gene design, we have created a Protein Construct Design interface for Gene Composer which distills protein structure information from PDB files and comparative sequence information into a graphical user interface that allows the user to simultaneously analyze known protein structures, together with homologous target protein sequences for which derivative constructs can be designed. This interface allows the user to simultaneously understand sequence conservation, ligand contacts, and known or predicted structural elements to define candidate protein constructs for crystallization and then design harmonized nucleic acid sequences to express those proteins.
To demonstrate the utility of this approach, we have used Gene Composer to design protein constructs, as well as the synthetic gene sequences to express those proteins. We compared the resulting protein expression level to full length proteins expressed from native genes. We initially chose to examine proteins from three different sources (human, viral and eubacterial) in two different heterologous expression systems (E. coli and wheat germ in vitro cell free expression). The results show that Gene Composer designed proteins and their corresponding engineered genes can be expressed at significantly higher levels than native genes. In addition, the results show that using Gene Composer to design protein constructs can significantly improve the likelihood of obtaining protein crystals.
Results and discussion
Construct Design Using Gene Composer™ Software
A similar analysis was performed with NS5B and showed that C-terminal residues were not conserved and not visible within the electron density maps of several crystal structures. From this analysis, we identified a C-terminal truncation spanning residues 1–568 for expression testing. Finally, an analysis of P38α showed that residues across the entire protein were relatively highly conserved and almost all residues could be visualized in published crystal structures. Based on these results, we did not create a truncated form and only attempted expression of the full-length protein.
At this point, each target was moved into the Gene Design module (Protein-to-DNA) of Gene Composer™. For engineered genes, the nucleic acid sequence was determined by back-translation from the amino acid sequence. As previously described , multiple factors were incorporated in the final design including E. coli codon usage, overall G:C content, potential mRNA secondary structures, presence of out-of-frame stop codons, removal of cryptic Shine-Dalgarno sequences, removal of repeated strings of the same nucleic acid sequence, as well as the silent introduction and removal of select restriction sites to facilitate cloning. For native genes, the wild-type sequences were input with no sequence alteration. The resulting full length engineered genes and native genes express identical proteins, and were cloned into identical expression vectors.
Protein Expression in E. coli
Protein constructs used in this test set.
Crystallography and Structure Determinations
It has been shown previously that nucleotide exchange occurs readily and has negligible conformational effects in certain FtsZ structures. To determine if this was the case for the B. subtilis FtsZ structure, crystals of FtsZ:GDP were soaked in the presence of GTP-γ-S. The crystals (FtsZ:GSP) tolerated the soaks well and diffracted to the same resolution (2.45Ǻ) as FtsZ:GDP crystals. Similar to FtsZ:GDP, this structure [PDB code: 2RHO] was traced from Ala12 to Phe315. However, the residual electron density at the C-terminus of subunit B was weak and the purification tag could only be traced to Leu316. Also, residues Lys221 and Gly222 on chain B were disordered and were not modeled. Examination of the difference electron density revealed that subunit A contained GTP-γ-S as there was clear density for the γ-phosphate. However, the GDP in subunit B was not exchanged as evident by difference density which was consistent with GDP. Interactions with GTP-γ-S and FtsZ are similar to those observed in the FtsZ:GDP structure except that the γ-phosphate forms contacts with Ala71 and Ala73 of the T3 loop.
Two additional crystal forms were obtained in the presence of lithium sulfate. A primitive tetragonal form, FtsZ:SO4, was found to contain a single sulfate bound in the nucleotide pocket [PDB code: 2RHH]. This sulfate ion occupies position similar to that of the β-phosphate of GDP and GTP-γ-S observed for FtsZ:GDP and FtsZ:GSP. Not surprisingly, the sulfate ion forms non-bonded contacts with residues in the T1 and T4 loops. The sulfate ion is positioned in a nearly identical manner as that observed in a recently published full length Bacillus FtsZ structure . A C-centered orthorhombic crystal form (FtsZ:2SO4) was found to contain two sulfate ions in the nucleotide binding pocket [PDB code: 2RHJ]. Additionally, prominent difference density was observed between the two sulfate ions which we clearly observed at a 6σ contour level. Initially, a water molecule was placed at this site however this water was surrounded by a large amount of positive difference electron density following refinement. The unidentified atom appeared to be coordinated by two water molecules and two oxygen atoms of the sulfates in a distorted square planar arrangement. We considered the possibility that this was a magnesium ion. However, although four coordinate magnesium is a possibility it typically forms six coordinate interactions. We then used the program Wasp to examine the solvent structure for potential metal ions. The water positioned at this site of positive difference density was flagged as a possible sodium ion. Therefore, a sodium ion was subsequently positioned in the difference density and refined. The refined metal-oxygen non-bonded distances and coordination geometry were consistent with those expected for a sodium ion. In this structure, the sulfate ions adopt similar positions as the α and γ phosphates of GTP-γ-S found for FtsZ:GSP. Similarly, non-bonded contacts are formed between the T1, T3, and T4 loops of FtsZ and the sulfate ions.
These results clearly show that proteins expressed from a synthetic Gene Composer engineered gene can be successfully used in protein crystallization experiments. More importantly, the results also show that protein construct engineering using structural and phylogenetic information can be critical for successful crystallization of a protein target since only the truncated form of FtsZ yielded protein crystals.
Protein Expression in E. ColiUnder High-Throughput Conditions
In order to begin to understand how often engineering of the nucleic acid sequence will improve protein expression, we randomly chose eleven different proteins from three different bacterial species on the National Institute of Allergy and Infectious Disease priority pathogens list, and compared protein expression from full length wild type genes and full length Gene Composer engineered genes. The proteins were expressed from identical expression vectors, were amended with identical purification tags, and the final protein product pairs had identical amino acid sequences. All of the expressed proteins contained 6× His tags to facilitate quantitation of soluble protein expression. In these examples the proteins were expressed using autoinduction media .
Protein Expression in CFS
In order to more accurately compare the protein expression levels, the soluble 6X-His tagged recombinant proteins were purified from the CFS extracts by nickel chelate bead capture and resolved SDS PAGE with Commassie blue staining as shown in Figure 6B. The NS5B proteins were not captured by nickel chelate and were shown to be located entirely in the insoluble fraction of the expression extract (data not shown). Quantification of the full length purified FtsZ proteins shows an approximate 2-fold increase in protein expressed from the engineered gene relative to the native gene, while expression of full length P38α from both sources is equal. Thus, the difference in protein expression levels from synthetic engineered P38α and FtsZ genes relative their native counterparts is modest in the CFS experiment, and do not recapitulate the outcomes observed when these genes were expressed in E. coli cells (see Figure 2). Taken together, these results suggest that the rare codons present in the native gene sequences of human P38α and B. subtilis FtsZ can significantly hamper protein expression of these proteins in E. coli, but this can be overcome either by codon engineering for E. coli or by providing an excess of molecular translation machinery in the wheat germ CFS system. Moreover, the synthetic codon engineered versions of these genes do not appear to offer any significant advantage for protein production in the CFS wheat germ extract.
In order to explore the possibility that codon engineering of synthetic genes for high level E. coli expression could affect protein production in a wheat germ extract, we compared the CFS expression of synonymous pairs of genes from Arabidopsis thaliana whose codon usage is comparable to that of wheat, wherein the native A. thaliana gene is compared to that of a synthetic codon engineered gene designed with codon usage of wheat genes. In all cases, the level of CFS expression from the wheat codon engineered gene was the same or slightly higher than expression from the native Arabidopsis gene (data not shown). This data is consistent with the suggestion that translation machinery is not limiting for protein production in the CFS wheat germ extract.
Obtaining sufficient quantities of soluble protein for structural studies often requires expressing proteins in heterologous expression systems. Unfortunately, the native nucleic sequence of any gene is likely to be biased from the evolution of that gene for expression in the native host, and in some cases, the gene may have selective pressures to be naturally expressed at very low levels. This bias may affect codon usage, codon context, mRNA secondary structures, regulatory elements, etc. which will likely impact the level of expression in a heterologous system. In order to overcome this fundamental problem, we have developed Gene Composer software to engineer any nucleic acid sequence for any given expression system (sister manuscript). In this report, we have systematically compared protein expression from native genes to protein expression from genes specifically engineered for high level expression in E. coli.
The majority of genes tested did not result in improved levels of protein expression. There are several possible explanations for this result. First, the yield of soluble protein may be dominated by the balance of protein turnover. If a protein is extremely unstable, increases in the rate of translation or fidelity of translation will have only small effects on the steady concentration of that protein in the cell. Future pulse-chase experiments designed to access rates of protein translation are necessary to address this possibility. Another possibility is that the results reflect our lack of understanding of the many factors that are important for protein expression and the likelihood that the "rules" are different for individual genes and proteins. For example, the engineered synthetic genes were harmonized for codon usage, absence of sequence repeats, absence of repeating codons within a sequence, reduction of local mRNA secondary structures within the transcript, and removal of cryptic Shine Dalgarno sequences; however, other factors may also be important for expression. It is not possible to test these unknown factors at this point; but synthetic gene design offers the ability to incorporate and test new gene design features as they become known.
Summary of rare codon usage in full length native genes.
FtsZ FL native
NS5B FL native
P38α FL native
The ability to express significant quantities of FtsZ enabled the crystallization of this target and demonstrates the utility of Gene Composer for protein design and construct engineering for structural studies. Specifically, we used Gene Composer to define structure-guided protein constructs for crystallization studies and showed that we were unable to obtain protein crystals of full length FtsZ, but did obtain multiple crystal hits from the truncated, engineered FtsZ construct (see Figures 1 and 3).
In conclusion, these results demonstrate the utility Gene Composer for the design of protein constructs and synthetic genes to express those proteins. The use of protein engineering to improve crystallization can have a significant impact on the gene to crystal structure process since very complex protein construct engineering principles (fusions, rearrangements, rationale surface variants, etc.) can be rapidly implemented without multiple molecular biology steps that can require significant amount of time and effort. Because engineering of the nucleic acid sequence did not decrease the level of protein expression and in some cases dramatically improved expression, synthetic gene design can also significantly improve the gene to crystal structure process. The utility of Gene Composer for engineering nucleic acid sequences will improve as more design principles are identified, especially as the cost of gene synthesis continues to decrease and robust protocols for rapidly synthesizing genes become more available.
Gene Composer engineering
Gene Composer engineering considerations included codon usage frequency minimum threshold of 2%, elimination of undesired restriction sites, elimination of five or more nucleotide repeats, placement of stop codons every 100 bp in second and third reading frames, elimination of cryptic Shine-Delgarno sequences, and optimized Tm and ΔG. Detailed information about engineering with Gene Composer can be found in (Sister Manuscript).
All amplifications were performed in 50 μl reactions using KOD Hi-Fi polymerase (Novagen) according to the manufacturer's instructions. Resulting PCR products were resolved on 1% agarose gels in 1× TBE + EtBr, and subsequently gel-purified using Qiagen gel extraction purification protocol. Amplification products not requiring gel purification were purified using the microfuge PCR purification kit protocol. DNA yields were assessed with a Beckman Coulter DU 500 spectrophotometer. Ligation reactions were performed with T4 DNA Ligase (NEB), using a 1:3 molar ratio of vector and insert DNA. Chemically competent OneShot TOP10 cells were transformed with 6 μl of each ligation using a standard heat shock protocol. Cells were pelleted, resuspended in 50 μl, and spread on 10 cm 2XYT plates containing the appropriate antibiotic. Following 16 hour incubation at 37°C, transformant colonies were screened by PCR and positive clones were sequence verified.
Bacterial expression of P38α, NS5B, and FtsZ
Gene Composer engineered and native full length FtsZ genes were purchased from DNA2.0, and subcloned into pET28 using by AflIII/BamHI digestion. The engineered genes were optimized for expression in E. coli using E. coli codon usage tables and design algorithms (Sister Manuscript). For the eight other constructs in this test set, sequence verified, pCRBluntIITopo clones were amplified by PCR. Truncated FtsZ native and Gene Composer engineered inserts were amplified with primers to incorporate a 5' NcoI site before the first Met ATG, and adding two stops with a 3' BamHI site following the affinity tags. Native P38α was amplified using primers incorporating a flanking 5' NdeI site such that after subcloning, the 5' ATG would fall in-frame with the vector derived N-terminal tags. The 3' primer added two stop codons and a BamHI site. The resulting PCR product was purified using the Qiagen PCR purification kit, and sequentially digested by first NdeI (Fermentas), ethanol precipitated and digested with BamHI (Fermentas). Due to an internal NdeI site, two digestion fragments were gel-purified: a 5' 238 bp fragment, flanked on both sides by NdeI, and a 3' 857 bp fragment, with 5' NdeI and 3' BamHI termini. A 3-fold molar excess of pooled fragments was combined with NdeI/BamHI-digested pET15b in a standard ligation. Gene Composer engineered P38α was cloned in the same fashion, except that no internal NdeI site was present. The four NS5B constructs were amplified with primers which add a 5' NcoI site flanking the initiating methionine ATG, followed by a 6× His tag. Two stop codons were added after the last amino acid, followed by a 3' BamHI site for subcloning into NcoI/BamHI-digested pET15b. Cloning of full length and truncated wild-type NS5B was complicated by the presence of two internal NcoI restriction sites; desired constructs were the product of multiway ligation into pET15b. All transformant colonies were screened by PCR, and sequence verified.
Cell free expression of P38α, NS5B, and FtsZ
Sequence verified whole gene synthesis products of native and Gene Composer engineered versions of P38α and NS5B were used as PCR templates for cloning into vector pEU-E01. Full length native and Gene Composer engineered FtsZ ORFs were ordered from DNA 2.0, and served as template for cloning of the FtsZ expression constructs. Amplification primers for all six constructs added a 5' SpeI site and 3' XhoI site for cloning into SpeI/XhoI-cut pEU-E01 (CFS). Following spin purification of PCR products, approximately 1 ug PCR product was digested with SpeI/XhoI (Roche) and gel purified. Ligations were set up as described previously, and transformation reactions were spread on 2XYT ampicillin 100 ug/ml plates.
E. coliexpression of pET P38α, NS5B, and FtsZ constructs
Cultures of 2YT supplemented with antibiotic were inoculated with single colonies of BL21(DE3) cells that had been transformed with each sequence verified pET28 or pET15 construct. Following overnight growth at 37°C, cultures were diluted 100-fold in the same medium to a final volume of 1L, shaken at 235 rpm at 37°C until OD600 reached approximately 0.8. Cultures were induced with 0.4 mM IPTG and incubated at 30°C for 4 hours. Bacterial cells were harvested by centrifugation and frozen at -20°C. From each expression, 1.0 g of cell paste was lysed in 20 mM Tris-HCl (pH 8.0), 500 mM NaCl, 20% (v/v) Glycerol, Complete-EDTA free protease inhibitor (Roche), 25 units Benzonase, and 0.1 mg/mL lysozyme and sonicated on a Branson Sonifier using a microtip, on 70% duty cycle, output 7, for 45 seconds each. Equal amounts of total protein, as quantified by A280 measurement on NanoDrop ND-1000 spectrophotometer were analyzed by SDS-PAGE. Samples were resolved on 4–12% Bis-Tris NuPage 12 well 1 mm gels in 1× MES running buffer, using reducing SDS-PAGE sample buffer and heating samples to 95°C 5 mins prior to loading and visualized by Coomassie blue staining, following manufacturer instructions. Protein bands were visualized on a Kodak Imager and quantified with the software ImageQuant 5.2 (Molecular Devices).
E. coliexpression of pET constructs of Brucella, Burkholderia, Rickettsia targets and pET FtsZ constructs
Pre-cultures (1.2 mL/well, in 96-well deep-well, round bottom block) of 2YT supplemented with antibiotic and 0.5% glucose were inoculated with 15% glycerol stocks made from isolated colonies of BL21(DE3) cells that had been transformed with each sequence verified pET28 construct. Pre-cultures were grown at 37°C, 220 rpm for 16 hours. Cultures (1.2 mL/well, in 96-well deep-well, round bottom block) of 2YT supplemented with antibiotic and Autoinduction System 1 (Novagen) were inoculated with 40 uL of pre-culture and grown for 40 hours at 20°C, 220 rpm. Cultures were harvested by centrifugation and stored at -20°C for at least 1 hour. His-tagged recombinant proteins were purified with magnetic Ni-NTA beads (Qiagen) according to manufacturer protocols on a BioRobot 3000 (Qiagen). Purified recombinant products were analyzed by capillary electrophoresis on Lab Chip 90 (Caliper) using an HT Protein Expression Assay Kit, according to manufacturer protocols.
Cell Free Systems
All sequence verified constructs for wheat germ cell free expression (CFS) were purified with an ultra pure plasmid miniprep kit (Marligen) and set to 1 ug/ul. Three expression campaigns were carried out on all FtsZ, NS5B, and P38α constructs. Due to the presence of His tags on all constructs, an H-kit was used. The H-kit contains all components necessary for transcription and translation. All steps were performed according to the manufacturer's instructions for small scale batch mode expression (WEPRO1240H Expression Kit_E_ver. 2.0, Aug. 13, 2007) [42, 43]. The RNA was quantified (A260 absorbance measurements) and normalized to provide the same amounts of each mRNA sample to the cell free translation mix.
In brief, mRNA encoding each protein was synthesized by in vitro transcription (20 ul) of 2 ug the pEU plasmid carrying the gene under control of the SP6 RNA polymerase promoter (37°C 6 hrs). To quantitate RNA, transcription samples were diluted 1:50 and quantitated against a blank of 1:50 dilution of transcription buffer at A260 on a Beckman Coulter DU500 spectrophotometer. Thirty-three ug RNA was added to 10.8 ul WEPRO 1240H containing 80 ng/ul creatine kinase and this mixture was laid under 208 ul 1× SUB-AMIX (24 mM HEPES/KOH (pH 7.8), 1.2 mM ATP, 0.25 mM GTP, 16 mM creatine phosphate, 250 units of RNasin (ribonuclease inhibitor), 1 mM dithiothreitol, 0.4 mM spermidine, 0.3 mM each of the 20 amino acids, 2.7 mM magnesium acetate and 100 mM potassium acetate). Translation was carried out overnight at 15–22°C. The following day the bilayer was mixed to homogeneity, and then 100 ul of crude extract was removed, and spun 13,000 rpm 4°C 15 minutes. The supernatant was removed as "soluble" fraction and the small remaining pellet resuspended in 100 ul 10 mM Tris pH 7.5 ("insoluble" fraction). For large scale expression and purification performed by the Desktop II robot (CFS), an H-kit was used, running 6 mls × 6 reaction cups, one construct per cup (six constructs total, NS5B FL, FtsZ FL, and P38α; native and engineered versions of each). The DT II program transcribes, translates, and Ni-IMAC purifies targets with a final elution in 0.5 M imidazole. It was carried out using default parameters (6 hours transcription, 16 hours translation at 15°C) and according to the manufacturer's instructions with the following exception. The robot was paused after the transcription step but before addition of wheat germ to the RNA, so that the RNA for each sample could be manually adjusted to 2600 ug/ml (0.3 ml RNA/sample). After this adjustment, automation was resumed through purification. Protein concentrations of fractions from all were estimated with a detergent compatible Protein Assay (Bio-Rad) according to manufacturer's protocols
Coordinates and structure factors for a FtsZ double deletant construct (Δ1–11, Δ316–382) with GDP bound have been deposited to the Protein Data Bank with accession code 2RHL. Crystals were obtained at 20°C in Compact Jr. (Emerald BioSystems) sitting drop plates using the CryoF crystallization screen (Emerald BioSystems) and 1 μL of protein with 1 μL of reservoir solution. The following conditions yielded the FtsZ crystals: 30% PEG 200, 100 mM MES pH 6.0, 5% PEG 3000. All crystals were frozen in a fresh drop of their respective crystallant which also served as a cryoprotectant.
Data Collection, Structure Solution and Refinement
Crystallographic data for Bacillus subtilis FtsZ structures.
Unit Cell a, b, c (Å)
66.37, 66.37 152.09
74.14, 82.06, 117.51
82.29, 97.58, 135.45
82.30, 97.16, 134.72
Resolution (Å) range/highest resolution shell
50 - 2.0
50 – 1.76
30 - 2.45
50 - 2.45
Number of atoms (protein/ligand/water)
2199, 2178b/32, 28/99
r.m.s.d. bond lengths (Å)
r.m.s.d. bond angles (°)
Average B factor all (Å2)
Average B factor protein (Å2)
Average B factor ligand (Å2)
Average B factor water (Å2)
Coordinate Error (Å)e
Ramachandran Analysis (%)
Availability and requirements
Gene Composer software can be downloaded from http://www.genecomposer.net.
Special thanks to Kai Post and Kathryn Hjerrild for contributions to Gene Composer and gene synthesis. This work was supported in part by the NIGMS-NCRR co-sponsored PSI-2 Specialized Center Grant U54 GM074961 for the Accelerated Technologies Center for Gene to 3D Structure, and by the Seattle Structural Genomics Center for Infectious Disease funded by the NIAID Contract HHSN266200700057C.
- Malawski GA, Hillig RC, Monteclaro F, Eberspaecher U, Schmitz AA, Crusius K, Huber M, Egner U, Donner P, Muller-Tiemann B: Identifying protein construct variants with increased crystallization propensity – a case study. Protein Sci. 2006, 15: 2718-2728.View ArticleGoogle Scholar
- Graslund S, Sagemark J, Berglund H, Dahlgren LG, Flores A, Hammarstrom M, Johansson I, Kotenyova T, Nilsson M, Nordlund P, Weigelt J: The use of systematic N- and C-terminal deletions to promote production and structural studies of recombinant proteins. Protein Expr Purif. 2008, 58: 210-221.View ArticleGoogle Scholar
- Lithwick G, Margalit H: Hierarchy of sequence-dependent features associated with prokaryotic translation. Genome Res. 2003, 13: 2665-2673.View ArticleGoogle Scholar
- Osawa S: Evolution of the Genetic Code. 1995, Oxford: Oxford University PressGoogle Scholar
- Karlin S, Mrazek J, Campbell A, Kaiser D: Characterizations of highly expressed genes of four fast-growing bacteria. J Bacteriol. 2001, 183: 5025-5040.View ArticleGoogle Scholar
- Sharp PM, Devine KM: Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do 'prefer' optimal codons. Nucleic Acids Res. 1989, 17: 5029-5039.View ArticleGoogle Scholar
- Sharp PM, Tuohy TM, Mosurski KR: Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986, 14: 5125-5143.View ArticleGoogle Scholar
- Bennetzen JL, Hall BD: Codon selection in yeast. J Biol Chem. 1982, 257: 3026-3031.Google Scholar
- Ikemura T: Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985, 2: 13-34.Google Scholar
- Moriyama EN, Powell JR: Codon usage bias and tRNA abundance in Drosophila. J Mol Evol. 1997, 45: 514-523.View ArticleGoogle Scholar
- Duret L: tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet. 2000, 16: 287-289.View ArticleGoogle Scholar
- Bulmer M: The selection-mutation-drift theory of synonymous codon usage. Genetics. 1991, 129: 897-907.Google Scholar
- Kane JF: Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr Opin Biotechnol. 1995, 6: 494-500.View ArticleGoogle Scholar
- Gustafsson C, Govindarajan S, Minshull J: Codon bias and heterologous protein expression. Trends Biotechnol. 2004, 22: 346-353.View ArticleGoogle Scholar
- Cormack BP, Bertram G, Egerton M, Gow NA, Falkow S, Brown AJ: Yeast-enhanced green fluorescent protein (yEGFP)a reporter of gene expression in Candida albicans. Microbiology. 1997, 143: 303-311.View ArticleGoogle Scholar
- Collins K, Gandhi L: The reverse transcriptase component of the Tetrahymena telomerase ribonucleoprotein complex. Proc Natl Acad Sci USA. 1998, 95: 8485-8490.View ArticleGoogle Scholar
- Burgess-Brown NA, Sharma S, Sobott F, Loenarz C, Oppermann U, Gileadi O: Codon optimization can improve expression of human genes in Escherichia coli: A multi-gene study. Protein Expr Purif. 2008, 59: 94-102.View ArticleGoogle Scholar
- Hagervall TG, Bjork GR: Undermodification in the first position of the anticodon of supG-tRNA reduces translational efficiency. Mol Gen Genet. 1984, 196: 194-200.View ArticleGoogle Scholar
- Bouadloun F, Srichaiyo T, Isaksson LA, Bjork GR: Influence of modification next to the anticodon in tRNA on codon context sensitivity of translational suppression and accuracy. J Bacteriol. 1986, 166: 1022-1027.Google Scholar
- Bjork GR, Durand JM, Hagervall TG, Leipuviene R, Lundgren HK, Nilsson K, Chen P, Qian Q, Urbonavicius J: Transfer RNA modification: influence on translational frameshifting and metabolism. FEBS Lett. 1999, 452: 47-51.View ArticleGoogle Scholar
- Gutman GA, Hatfield GW: Nonrandom utilization of codon pairs in Escherichia coli. Proc Natl Acad Sci USA. 1989, 86: 3699-3703.View ArticleGoogle Scholar
- Boycheva S, Chkodrov G, Ivanov I: Codon pairs in the genome of Escherichia coli. Bioinformatics. 2003, 19: 987-998.View ArticleGoogle Scholar
- Boycheva SS, Bachvarov BI, Berzal-Heranz A, Ivanov IG: Effect of 3' terminal codon pairs with different frequency of occurrence on the expression of cat gene in Escherichia coli. Curr Microbiol. 2004, 48: 97-101.View ArticleGoogle Scholar
- Wu G, Dress L, Freeland SJ: Optimal encoding rules for synthetic genes: the need for a community effort. Mol Syst Biol. 2007, 3: 134-View ArticleGoogle Scholar
- Wu G, Zheng Y, Qureshi I, Zin HT, Beck T, Bulka B, Freeland SJ: SGDB: a database of synthetic genes re-designed for optimizing protein over-expression. Nucleic Acids Res. 2007, 35: D76-79.View ArticleGoogle Scholar
- Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S: Gene Designer: a synthetic biology tool for constructing artificial DNA segments. BMC Bioinformatics. 2006, 7: 285-View ArticleGoogle Scholar
- de Smit MH, van Duin J: Control of prokaryotic translational initiation by mRNA secondary structure. Prog Nucleic Acid Res Mol Biol. 1990, 38: 1-35.View ArticleGoogle Scholar
- Jin H, Zhao Q, Gonzalez de Valdivia EI, Ardell DH, Stenstrom M, Isaksson LA: Influences on gene expression in vivo by a Shine-Dalgarno sequence. Mol Microbiol. 2006, 60: 480-492.View ArticleGoogle Scholar
- Lorimer D, Raymond A, Walchli J, Mixon M, Barrow A, Wallace E, Grice R, Burgin A, Stewart L: Gene Composer: Database Software for Protein Construct Design, Codon Engineering, and Gene Synthesis. BMC Biotechnology. 2009, 9: 36-View ArticleGoogle Scholar
- Roosild TP, Vega M, Castronovo S, Choe S: Characterization of the family of Mistic homologues. BMC Struct Biol. 2006, 6: 10-View ArticleGoogle Scholar
- Kefala G, Kwiatkowski W, Esquivies L, Maslennikov I, Choe S: Application of Mistic to improving the expression and membrane integration of histidine kinase receptors from Escherichia coli. J Struct Funct Genomics. 2007, 8: 167-172.View ArticleGoogle Scholar
- Cordell SC, Robinson EJH, Löwe J: Crystal structure of the SOS cell division inhibitor SulA and in complex with FtsZ. Proc Natl Acad Sci USA. 2003, 100: 7889-7894.View ArticleGoogle Scholar
- Oliva MA, Cordell SC, Löwe J: Structural insights into FtsZ protofilament formation. Nat Struct Mol Biol. 2004, 11: 1243-1250.View ArticleGoogle Scholar
- Mukherjee A, Dai K, Lutkenhaus J: Escherichia coli cell division protein FtsZ is a guanine nucleotide binding protein. Proc Natl Acad Sci USA. 1993, 90: 1053-1057.View ArticleGoogle Scholar
- Erickson HP: Atomic structures of tubulin and FtsZ. Trends Cell Biol. 1998, 8: 133-137.View ArticleGoogle Scholar
- Hale RS, Thompson G: Codon optimization of the gene encoding a domain from human type 1 neurofibromin protein results in a threefold improvement in expression level in Escherichia coli. Protein Expr Purif. 1998, 12: 185-188.View ArticleGoogle Scholar
- Löwe J, Amos LA: Crystal structure of the bacterial cell-division protein FtsZ. Nature. 1998, 391: 203-206.View ArticleGoogle Scholar
- Harding MM: Geometry of metal-ligand interactions in proteins. Acta Crystallogr D Biol Crystallogr. 2001, 57: 401-411.View ArticleGoogle Scholar
- Nayal M, Di Cera E: Valence screening of water in protein crystals reveals potential Na+ binding sites. J Mol Biol. 1996, 256: 228-234.View ArticleGoogle Scholar
- Harding MM: Metal-ligand geometry relevant to proteins and in proteins: sodium and potassium. Acta Crystallogr D Biol Crystallogr. 2002, 58: 872-874.View ArticleGoogle Scholar
- Studier FW: Protein production by auto-induction in high density shaking cultures. Protein Expr Purif. 2005, 41: 207-234.View ArticleGoogle Scholar
- Madin K, Sawasaki T, Ogasawara T, Endo Y: A highly efficient and robust cell-free protein synthesis system prepared from wheat embryos: plants apparently contain a suicide system directed at ribosomes. Proc Natl Acad Sci USA. 2000, 97: 559-564.View ArticleGoogle Scholar
- Sawasaki T, Hasegawa Y, Tsuchimochi M, Kamura N, Ogasawara T, Kuroita T, Endo Y: A bilayer cell-free protein synthesis system for high-throughput screening of gene products. FEBS Lett. 2002, 514: 102-105.View ArticleGoogle Scholar
- Otwinowski Z, Minor W: Processing of x-ray diffraction data collected in oscillation mode. Methods Enzymol. 1997, 276: 307-326.View ArticleGoogle Scholar
- Murshudov GN, Vagin AA, Dodson EJ: Refinement of macromolecular structures by the maximum-likelihood method. Acta Cryst D. 1997, 53: 240-255.View ArticleGoogle Scholar
- Jones TA, Zou JY, Cowan SW, Kjeldgaard M: Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr A. 1991, 47: 110-119.View ArticleGoogle Scholar
- Oliva MA, Trambaiolo D, Löwe J: Structural insights into the conformational variability of FtsZ. J Mol Biol. 2007, 373: 1229-1242.View ArticleGoogle Scholar
- Lamzin VS, Wilson KS: Automated refinement for protein crystallography. Methods Enzymol. 1997, 277: 269-305.View ArticleGoogle Scholar
- Emsley P, Cowtan K: Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004, 60: 2126-2132.View ArticleGoogle Scholar
- Carson M: Ribbons. Methods in Enzymology. 1997, 277: 493-505.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.