Inteins are widely used in biotechnology to manipulate proteins. The capacity of inteins to break and form peptide bonds has been exploited to develop a range of methodologies from a routine procedure to purify proteins, to more advanced semi-synthetic applications that assemble proteins for NMR, crystallography and mechanistic studies. These uses and others have been recently reviewed [10, 11, 30].
The use of inteins as genetic tools is still an emerging area of research. Temperature-sensitive inteins [31–33], light-activated inteins , and inteins controlled by small molecules [35–37], provide a way to control protein activity by controlling when a protein that has been interrupted by an intein is spliced together to yield its mature form. Conditional protein assembly by inteins in vivo is a promising method to generate temperature-sensitive mutants, and target when and where a protein is functional in a cell.
Here we add a new role for inteins in genetic manipulations, as a genetic marker with the unique property of self-excision. The engineered intein marks the expression of the protein into which it is embedded and yet after splicing is not a part of the mature protein. We demonstrate in yeast the use of a Pch PRP8 His5-marked intein to deliver, in one step and in vivo, GFP within a targeted protein without disturbing the function of the protein. In principle, the method could be extended to other protein tags, and adapted to other organisms.
We created a family of genetically marked inteins based on the PRP8 intein of Penicillium chrysogenum. PRP8 inteins have an ancient heritage and are widespread [19, 38–40]. However they are not found in S. cerevisiae. The Pch PRP8 intein and the native yeast VMA1 intein are sufficiently divergent (21% sequence identity) to preclude homologous recombination between the two during the transformation step. For comparison the S. cerevisiae HIS3 and the S. pombe his5
genes share 58% sequence identity and yet do not recombine during integrative transformation . The sequence divergence, the fact that the Pch PRP8 intein is well characterized [9, 19, 20, 39], splices at high efficiency across a broad range of temperatures  and lacks a DNA binding domain led to its choice in our studies.
The primary sequence of the Pch PRP8 mini-intein shares sequence similarity with other inteins . Thus its three-dimensional structure likely conforms to the canonical Hedgehog/Intein (HINT) protein scaffold [42–48]. The HINT module has a flat horseshoe shape with a series of β-strands forming the framework. The active-site is centered at the interior base or "toe" of the horseshoe. Endonucleases if present in the intein are a separate domain located away from the active site region at the "heel" of the horseshoe.
Here we show that the Pch PRP 8 intein, and presumably other inteins, have a remarkable capacity to accept foreign protein domains at the region of the structure where the homing endonuclease resides in full inteins. High splicing efficiency was observed for all of the inserted single domain proteins that were used for genetic markers. We attribute the success to several factors. An alignment of the Pch PRP8 intein with the Synechocystis sp. DnaB and the Mycobacterium xenopi GyrA inteins shows that the insertion site for our genetic markers is located in disordered loops in the crystal structures of both of these inteins [42, 44]. Thus the region of insertion is inherently flexible and unstructured. In addition, one shared feature of the proteins used for markers is that their N-and C-termini are proximal [22–25]. The exit of the inserted protein domain out of the intein and its return occurs nearby in space and would tend to minimize strain in the structure of the intein. Since the majority of single domain proteins have some region of their two termini within 5 Å [49, 50] and termini are consistently found at the surface of the protein , we predict that many single domain proteins could be inserted into inteins without detriment to splicing efficiency.
Three enzymes and a transcription factor were used as genetic markers to modify the Pch PRP8 intein. In pilot experiments all four modified inteins showed high splicing efficiency, both in E. coli and yeast, and in yeast the proteins within the excised inteins were active and enabled growth under conditions of selection. However when these inteins were used to deliver GFP within calmodulin only the strain tagged with the Pch PRP8 His5-intein grew normally. Under selective conditions transformation with the Pch PRP8 LexA-VP16-intein did not yield any transformants, and the G418R-and HygR-inteins yielded cells with poor growth and diminished splicing efficiency. The disruption of calmodulin function was surprising, given the pilot results, and is not understood. It is possible that the unspliced products which accumulate interfere with growth. We are actively investigating ways to boost the level of stable protein and the efficiency of splicing through changes in codon usage and the composition of the linkers between the various domains. In addition, given the success of the Pch PRP8 His5-intein, we are constructing other inteins with metabolic markers that may require lower levels of expression.
Our intein based method to insert the gene encoding GFP within an ORF offers several advantages over other PCR-based methods available in yeast. Other methods [52–54] introduce a gene with its own promoter that must be first be removed before the GFP is correctly in frame. Thus a second recombination event is required, either catalyzed by the induction of Cre-recombinase or by a rare natural recombination event that is isolated by genetic selection, to remove the genetic marker with its associated promoter. In addition the Cre-recombinase based system leaves behind within the ORF a peptide sequence coded for by the remaining loxP site, a sequence that can disrupt activity (unpublished). The introduced gene also temporarily interrupts the expression of the target protein. With the intein-based method integration and tagging occur in one step. Since the marked inteins are inserted in frame, expression of the target is only disrupted during recombination. Given these advantages marked inteins show great promise and complement other available methods.