Multi-line split DNA synthesis: a novel combinatorial method to make high quality peptide libraries

Background We developed a method to make a various high quality random peptide libraries for evolutionary protein engineering based on a combinatorial DNA synthesis. Results A split synthesis in codon units was performed with mixtures of bases optimally designed by using a Genetic Algorithm program. It required only standard DNA synthetic reagents and standard DNA synthesizers in three lines. This multi-line split DNA synthesis (MLSDS) is simply realized by adding a mix-and-split process to normal DNA synthesis protocol. Superiority of MLSDS method over other methods was shown. We demonstrated the synthesis of oligonucleotide libraries with 1016 diversity, and the construction of a library with random sequence coding 120 amino acids containing few stop codons. Conclusions Owing to the flexibility of the MLSDS method, it will be able to design various "rational" libraries by using bioinformatics databases.


Background
The combinatorial synthesis method has been demonstrating its effectiveness in discovering novel functional molecules. Examples of this method in the field of evolutionary protein engineering are selections of a novel functional peptide from a random library on solid support [1], phage display [2] or in vitro virus (synonym for RNA-peptide fusion or mRNA-display) [3][4][5]. The efficiency of the methods depends on the screening technique employed and the library quality. In the display methods, a library of polynucleotide templates must be prepared in order to obtain a random peptide library. A primitive random library of such templates is (NNN) n (N = equimolar mixture of A, T, G and C). This library leads to premature short peptides and a particular bias of the amino acid composition, which makes the effective searchable sequence space biased. A slightly improved library NNK or NNS (K / S = equimolar mixture of T and C / G and C) has been conventionally used. Several methods have been developed for a more improved library. Various "rational" libraries in which the nucleotide mixtures were optimized for a target amino acid composition by using a computer calculation have been developed [6][7][8].
Removal of stop codons to obtain long ORFs is important for the evolutionary design of a novel protein starting from a random library. Several methods based on random block-ligation were reported [9,10]. Two high quality libraries that lead to the successful evolutionary protein design were as follows: the trinucleotide phosphoramidites (3NPs) method using twenty pre-synthesized trimers of nucleotide phosphoramidites [11][12][13][14], and the pre-selecting method using an mRNA display with a C-terminus affinity tag in order to remove stop codons [15].
We report in this article on a convenient method for the construction of a high quality library based on combinatorial DNA synthesis. This library has few stop codon and has an optimized amino acid composition for various purposes. A random library based on the split synthesis [1] is made routinely in combinatorial chemistry, but a few methods [16,17] and a few applications [18,19] have been reported for oligonucleotides synthesis. They were used for mutagenesis and the products did not have high quality for evolutionary protein engineering. We applied the split synthesis to oligodeoxyribonucleotide synthesis and developed a new procedure, based on the synthesis of designed codon mixtures using multi-line DNA synthesizers. Our method, Multi-Line Split DNA Synthesis (MLSDS), requires only standard reagents and three or four synthesizers for DNA synthesis. MLSDS can make various "rational" libraries of huge diversity with few stop codons.

Results and Discussions
Adaptive design to the target amino acid composition Scheme of the MLSDS method is shown in Fig. 1 and Table 1, and described in detail in Methods section. MLSDS is able to remove not only stop codons but also particular codons. It is able to design the codon composition. We incorporated the effect of the single nucleotide deletion during a general oligonucleotide synthesis [20] into the design.
Designed biased libraries are useful for creating various novel proteins such as a functional peptide without Cys [21] or an engineered protein without Met [22]. Unnatural codons and unnatural amino acid [23] will be also incorporated in desired composition. It will be able to incorporate various results of analysis of bioinformatics databases in order to make an initial library with higher evolvability in experimental protein evolution. The optimum amino acid composition in the library may be different for each target protein. For example, when we want to explore the global protein sequence space exhaustively, the uniform amino acid composition may be the best.
When we want to explore only a proven region in the protein sequence space, the use of the average amino acid composition among natural proteins [24] might be better for many aspects. When we want to design a protein with some specific properties, a library with increased or decreased fraction of specific amino acid should be constructed for each segmental region on the polypeptide chain.  Table 1.

Scheme of the MLSDS method
Among these wide spectra of requirements, we designed DNA libraries that code peptide libraries having various characteristics and have no stop codons. Examples are: a library with the average amino acid composition of natural proteins [24], which is named "Natural" library in this article, the uniform amino acid composition; and the uniform composition except [Cys] = 0. A library encoding only four kinds of amino acid (a c-Fos mutant library [26]) was also designed. Designed molar mixing ratios of A:T:G:C for some of these libraries are shown in Table 1.
Another interesting example was obtained when the target composition was "Uniform except [Met] = 0 and [Term] = 0". The designed molar mixing ratio of A:T:G:C gave the high fitness F value (0.96 on three lines splits) and gave no stop codon even if the effect of a point deletion was included in the GA calculation. A Met-less random library may be the best starting library for global search of the protein sequence space. This speculation is supported by the report [22] stating that a mutant dihydrofolate reductase generated by the replacement of all Met had much higher enzymatic activity than the wild type.
Internal deletion problem in the oligonucleotides synthesis process is important. It destroys the codon-based design, leading to stop-codon generation and undesirable amino acid composition. Our program incorporated deletion effects into the GA calculation and succeeded to minimize the deletion problem. Moreover it was reported that contamination of deletion products could be decreased on a denaturing PAGE for DNA of this length [15].
We also investigated the practical number of DNA synthesizers. For this purpose, we calculated the final correlation coefficient between the designed and the various target compositions with up to 6-line DNA synthesizers. As shown in Fig. 2, the final correlation coefficient (= the final fitness) became saturated at about 3-or 4-lines on this program. Our GA program is not the best for obtain best F value but suitable for designing actual synthesizing operations. These results showed MLSDS method gave a high quality library even with three DNA synthesizers.
When we took the natural abundance as the target amino acid composition, we got a highest fitness value F = 0.99 (on three lines) in the GA calculations. This is reasonable, because the average amino acid composition among natural proteins highly correlates to the number of synonymous codons in the standard genetic code table [25].

Synthesis of MLSDS libraries
We synthesized a "Natural" library and a "Uniform except [Cys] = 0" library mentioned in the previous section. In Table 2 Dependence of the final fitness on the number of DNA syn-thesizers for various libraries We also synthesized MLSDS products composed of limited kinds of amino acid. It has been regarded that such a peptide can be synthesized only by 3NP method. A mutant c-Fos library that contained only four kinds of amino acid was synthesized, which was equivalent to a library synthesized by 3NP method [26]. It was a high quality library (F = 1.00) ( Table 2). So far, fifteen libraries with various amino acid compositions were successfully synthesized.
In order to make long ORFs, we assembled 8 units of the oligomers. Stem sequences of them did not have any stop codons. A DNA library encoding 120 amino acids plus nine 5'-and 3'-flanking semi-random di-peptides (thus, total 138 amino acids) was constructed (Fig. 3).
The diversity of the synthesized library is about 10 16 judging from the mass (data of A 260 ) and purity (data of PAGE) of synthesized DNA. With an in vivo selection, there is a diversity limit by the transformation step. But with an in vitro selection, there is no such limitation. Thus exploration of huge sequence space by in vitro virus [3][4][5] or related techniques [28,29] will become possible, depending on the experimental cost.

Comparison of MLSDS with other methods
So far, a really random library has been generated by four methods. Other methods do not give a really random library, because they can not provide a library in which all the 20 amino acids are encoded at all sites. A comparison of library quality for three methods is shown in Table 3.
An application of 3NPs method to mutagenesis of antibodies [27] or coiled-coils [30] gave good results. Twenty kinds of 3NPs mean one codon per one amino acid, but the codons are degenerate. Thus 3NPs method makes many tRNAs useless. The translation efficiency was calculated based on the codon usage, giving maximum 4-fold decrease in Triticum aestivum. It was reported that the reaction efficiency of 3NPs was far from uniform. The sequence data of synthesized DNA using an equimolar mixture of 19 kinds of 3NPs (without Cys) showed 12fold (maximum) difference in composition [27] or more [12]. The coupling yield was affected by the mixing ratio  of 3NPs and by the context, showing 8-fold (maximum) difference for the same 3NP [27]. Thus it will be difficult to correct reaction efficiencies by adjusting the mixing ratio. The correlation coefficient between the target composition and the actual composition was about 0.4 (for uniform 19 kinds of amino acids) [27] (Table 3). Dimerphosphoramidites [17] method is a variation of 3NPs method, using pre-synthesized amidites, and had the same problems. In fact, the bias was observed [17].
A pre-selecting method using an mRNA display [15] was fruitful in evolutionary protein design. Novel peptide aptamers were evolved starting from a long ORF random library [31,32]. But this method could not remove all the stop codons. It gave limited library diversity. This method has low flexibility in amino acid composition. For example it is difficult to generate a "Uniform except [Met] = 0" library. The correlation coefficient between the target composition and the actual composition were not so high ( Table 3).
The Y-Ligation Block Shuffling (YLBS) method [9] has high potentiality in the evolutionary design of peptides. It has problems on deletion and reaction bias of RNA ligase. Table 3. Above-mentioned problems are not so severe for MLSDS method, because it uses only standard phosphoramidites and is free from any biochemical bias such as in mRNA display and in YLBS. It was reported that the difference in the reaction efficiency of equimolar mixture of four kinds of mono-phosphoramidites was only about 1-5 % [33,34]. MLSDS can create any specific amino acid composition as same as 3NP method, and a MLSDS library is made at lower cost than that made with other methods.

Conclusions
We applied the split synthesis to oligodeoxyribonucleotide synthesis and developed a new procedure, Multi-Line Split DNA Synthesis (MLSDS), based on the synthesis of designed codon mixtures using three-line DNA synthesizers. MLSDS can make various "rational" libraries of huge diversity with few stop codons by using bioinformatics databases. Combination of an MLSDS library with a screening method for huge diversity will accelerate the protein evolution in vitro. The values were calculated using the data of full-length (without deletion) libraries. Values of 3NPs and mRNA display method were calculated with some assumptions. In mRNA display method, the value of stop codons were estimated from data described in the article and some assumptions; one cassette did not have two or more stop codons and stop codons appeared equally in every position. The target composition of β-Cassette polar is assumed that polar amino acids are equi-molar and the others are 0%. a Correlation coefficient is calculated using target and actual amio acids composition including stop codons contribution. And it is assumed that stop codons were not appeared in full-length libraries of 3NPs method. b Sum of absolute values of difference between each target and actual amino acid. c ND is abbreviation of no data.

Methods
A random MLSDS library was synthesized as follows. A standard DNA synthesis method was used in three lines of DNA synthesizer running in parallel. The randomized regions were combinatorialy synthesized in codon units. Triplet codons were synthesized separately in the three synthesizers as an elongation reaction of oligonucleotides on beads made of controlled pore glass (CPG). CPG beads were mixed together manually, and then splitted again into three reaction tubes manually and the next triplet codons were synthesized (Fig. 1).
The sequence of a 87 mer library was 5'-GAT GAG GCG AAG ACG NAC TGS (123/456/789) 15 NAC TGS GAG GCT GGC TGC CAC-3', where N and S denote A/T/G/C and G/ C, respectively. The A:T:G:C mixing ratio in each letter of three codon groups 123, 456, and 789 was shown in Table  1. These values were calculated as described below. Both flanking regions contain the recognition sequences of type-IIs restriction enzymes BbsI and BbvI, respectively. In order to make longer sequences, we ligated 2 to 8 units of oligomers at the cohesive ends (the underlined sequences shown above) generated by the restriction enzyme treatment. The assembly method was as described in Ref. [16]. The italicized sequence shown above represents the assembly unit (random region of 45 bp and flanking semi-random linking region of (6+6)/2 bp). And for example, when L = 3, the expected alanine composition p 1 is given by: for the full-length sequence without deletion.
We solved an integer-programming problem (6-valued 12L-dimensional optimization problem) having the solution x i as integer (0,1,2,3,4,5). The reason for 6-digits "integer" was to simplify the DNA synthesizer handling and also to simplify the calculation. As the fitness F of x in the GA, we took a correlation coefficient between the expected (or designed) amino acid composition and the target amino acid composition: Production of 2×, 4×, and 8× length library .  where N = 21 for our normal case. The optimum x, which gave the maximum fitness F, was calculated using a simple GA program.
It was reported the deletion rate during a general oligonucleotide synthesis is about 0.5% per coupling [20], and our data (about 0.3% per coupling) were compatible with this value. We incorporated the effect of the single nucleotide deletion into the GA calculation. We considered only the affect of a point deletion in a synthesized oligonucleotide because the deletion rate is low enough. When a point deletion occurs in the 5' constant region, all the amino acids in the random region are the frame shifted ones. When the event occurs at the i-th site of the random region, it affects the composition in the all downstream from the i-th site, and so on. We incorporated all these effects into the calculation of the composition. Details are described in Additional file 1.