Estimating the number of integrations in transformed plants by quantitative real-time PCR

Background When generating transformed plants, a first step in their characterization is to obtain, for each new line, an estimate of how many copies of the transgene have been integrated in the plant genome because this can deeply influence the level of transgene expression and the ease of stabilizing expression in following generations. This task is normally achieved by Southern analysis, a procedure that requires relatively large amounts of plant material and is both costly and labour-intensive. Moreover, in the presence of rearranged copies the estimates are not correct. New approaches to the problem could be of great help for plant biotechnologists. Results By using a quantitative real-time PCR method that requires limited preliminary optimisation steps, we achieved statistically significant estimates of 1, 2 and 3 copies of a transgene in the primary transformants. Furthermore, by estimating the copy number of both the gene of interest and the selectable marker gene, we show that rearrangements of the T-DNA are not the exception, and probably happen more often than usually recognised. Conclusions We have developed a rapid and reliable method to estimate the number of integrated copies following genetic transformation. Unlike other similar procedures, this method is not dependent on identical amplification efficiency between the PCR systems used and does not need preliminary information on a calibrator. Its flexibility makes it appropriate in those situations where an accurate optimisation of all reaction components is impossible or impractical. Finally, the quality of the information produced is higher than what can be obtained by Southern blot analysis.

ing many primary transformants (T 0 ) resides in the mechanism of integration itself: since the new DNA is inserted at random in the plant genome, plants with one to several integrated copies are obtained, and the multiple copies can be found in one or more chromosome locations. Usually plants where one or two integration events have occurred are those with the highest level of expression of the new gene. Low and sometimes unstable expression of transgenes has been related with high copy number and subsequent transgene silencing [2,3]. It is therefore clear that the T 0 plantlets have to be analysed as soon as possible, so that only the most interesting ones are taken through the steps of acclimatation in soil, flowering, seed production, etc.
Transgene copy number is usually estimated by Southern analysis, a classic molecular biology method. This procedure provides an indication of the number of integrated copies, but is quite costly in terms of reagents, labour, and time, and it requires a relatively large amount of plant material to start with. Moreover, in the presence of rearranged copies (with loss of restriction sites), the estimates are not correct.
In this report we have used a different strategy to estimate transgene copy number in T 0 plants, the quantitative realtime PCR, and compared the results with those of Southern analysis.
Real-time PCR has made it possible to accurately quantify starting amounts of nucleic acid during the PCR reaction without the need for post-PCR analyses. A fluorescent reporter is used to monitor the PCR reaction as it occurs. The reporter can be of a nonspecific nature (such as SYBR Green I) or of a specific nature (such as TaqMan probes, molecular beacons, FRET probes). In a TaqMan assay [4], the probe is labeled at the 5' end with a fluorescent reporter molecule and at the 3 ' end with another fluorescent molecule, which acts as a quencher for the reporter. When the two fluorophores are fixed at opposite ends of the 20-30 nt probe and the reporter fluorophore is excited by an outside light source, the normal fluorescence of the reporter is absorbed by the nearby quencher, and no reporter fluorescence is detected. When Taq polymerase encounters the bound probe during extension from one of the primers, it digests the probe by its 5' exonuclease activity, freeing the reporter from the quencher, and the reporter fluorescence can be detected and measured [5]. The fluorescence of the reporter molecule increases as products accumulate with each successive round of amplification.
With real-time PCR, results can be obtained quickly and can be subjected to statistical analysis. Since quantitative data on distinct sequences of the T-DNA can be obtained, lines with possible rearrangements are much easier to recognize than with Southern analysis.

Results and Discussion
In analysing tomato transgenic plants three genes were considered, one endogenous (tomato ascorbate peroxidase, apx) and two transgenic ones, the nucleocapsid gene of Tomato spotted wilt virus (TSWV-N) and the neomycin phosphoril-transferase II gene (nptII). A total of six experiments (two experiments for each gene) were conducted and standard curves, obtained from serial dilutions of a transgenic line, were produced using the Bio-Rad iCycler software. In all experiments samples were run in triplicate. The correlation coefficients of the standard curves were rather good, being in the range between 0.990 and 0.997. Representative curves for the apx, TSWV-N and nptII are shown in Figures 1, 2 and 3, respectively.
From the standard curves the starting quantities of each gene in each tomato line was determined, again with the iCycler software. The results of the two experiments (six measurements) conducted on each gene were combined and are shown in Table 1 together with their 95% confidence intervals. Most of the data fall inside the upper and lower limits of the standard curves, except one case for apx, and three cases for TSWV-N and nptII. Since the realtime PCR technique is known for producing linear response over a wide range of starting concentrations [6], and these few data are not too distant from the extreme values used to produce the standard curves, we considered them acceptable.
To estimate the number of transgene copies in each tomato line, the r line values of the ratio between transgenic and endogenous starting quantities were calculated (Table 2), and from these r line values the "virtual calibrator" r 1 was calculated, which is the value of such ratio corresponding to one copy of transgene. The virtual calibrator is a weighted combination of all the lines under study, the weight given to each line depending on the accuracy of the determination of r line . This procedure does not require to have a real "calibrator" line, identified with an independent test. As a consequence it allows to perform analysis of copy number even in the absence of previous knowledge on the transgenic lines, provided at least one line among all those analysed contains one copy.
The copy number for each line was determined as r line /r 1 (see Table 2). Of course the copy number determined in this way is a real number: our estimate for the actual, integer copy number is the range of integers that are included in the 95% confidence interval around the ratio r line /r 1 . In the case of TSWV-N, this range included at least one integer in all lines. For two lines (110-1 and 80-1) in the case of the nptII transgene there is actually no integer in-cluded in such interval, so we quote as our estimate of the copy number the integer closest to the r line /r 1 .
By comparing the results for each line (see Table 3), it appears that in some cases the number of integrated copies of the two transgenes is the same, but in others it is not. This indicates that rearrangements have occurred in the T-DNA during the process of integration in plant chromosomes, and the integrity of the transformation cassette, which included both TSWV-N and nptII genes, was not preserved. This is the case for lines 1-2 c, 46-1, 127-1, 1-1 and 118-2. Line 118-2 appears as an extreme case, where 5 copies of nptII gene have been integrated, but not even one copy of TSWV-N gene. In 5 lines the integration appears to have occurred without loss of one of the transgenes: 111-6, 99-1, 110-1, 80-1 and 133-1. For the remaining four lines, all with estimates of 3 or more copies, no conclusion can be drawn. All this information on rearrangements due to loss of one of the transgenes is usually not available when classical Southern analysis is performed, since normally only the transgene of interest is considered, and not the selectable marker gene.
When a line was analysed both as primary transformant, which carries the new DNA only on one allele, and homozygous progeny, carrying it on both alleles, the values obtained for the progeny were, as expected, twice those obtained for the T 0 . This represents a fast and easy way to In the case of the TSWV-N gene, data from real-time PCR were compared with data from Southern analysis (Table  4). Initially, the DNAs from the lines were digested with KpnI, a single-cutter in the T-DNA, and analysed with a TSWV-N-specific probe. The results of this first analysis indicated that the two techniques agreed only in 3 cases (111-6, 1-2 c and 118-2). In the attempt to understand the causes of such discrepancies, a deeper Southern analysis was therefore performed, cutting the plant DNAs with other restriction enzymes, again single-cutters, but located in different parts of the T-DNA (   TSWV-N sertion of more than one T-DNA copies in one locus, deletions of sequences outside the coding parts of genes but containing the restriction site used for the analysis (actually, KpnI is located between the 3'-end of the TSWV-N gene and the left border), generation of DNA fragments of very similar size that are not resolved on the gels, etc.
For 3 lines (99-1, 1-2 c and 127-1) the real-time PCR estimates were lower than those produced by the Southerns. In those cases deletions or rearrangements probably affected the short 79bp-long sequence recognized by the real-time PCR primers and probe. This sequence does not include recognition sites of the enzymes used in Southern analysis. Alternatively, partial digests may have artifactu-ally increased the estimates obtained by Southern analysis.
As a concluding remark on the comparison between the two methods, it is possible to state that multiple insertions, rearrangements, partial digests and a subjective evaluation of bands on the films are the most important causes that may produce a wrong estimation of copy number by Southern analysis. On the other hand, rearrangements are the only reason for real-time PCR to be unable to detect the presence of a copy of a gene.

Figure 3
Real-time PCR amplification and standard curve of nptII transgene. Upper panel: real-time PCR logarithmic plot resulting from the amplification of four three-fold serial dilutions of a tomato standard DNA (see Methods for details) using the nptII-specific primers and probe described in Table 5.

Conclusions
The standard curve is the key element for the quantitative assay: since it is based on the standard DNA used, the choice and the preparation of this DNA is extremely important. One of the proposed methods to prepare the standard DNA consists in mixing plant DNA with a plas-  This approach, however, introduces several sources of error that cannot be controlled: previous absolute quantification of both plant and plasmid DNA is necessary, together with precise knowledge of the nuclear genome size of the plant species to be assayed. Unfortunately for most plants only approximate estimates are available. All these problems were by-passed by simply taking the DNA of one transgenic line and using its dilutions to construct the standard curve. As shown above, results of the quantification are then used to build a "virtual calibrator", and finally to estimate the transgene copy number for each line. As an alternative to the standard curve method, relative quantification can also be achieved with the method named "comparative CT" or "delta-deltaCT" [8]. This method has the advantage of not requiring the construction of a standard curve for each experiment, but requires a validation experiment to demonstrate that reaction efficiencies for transgenes (two in our case) and endogenous gene are identical or at least very close [9]. Since reaction efficiencies were good but not identical for the TSWV-N, nptII and apx systems, it would be incorrect to use this method without performing further optimization of the systems, such as testing several combinations of primer concentration, Mg concentration, etc., without any guarantee of finding the right conditions for the three systems. Furthermore, every time a new gene, both transgene or endogenous, is studied the extended optimization must be repeated. The method used in the present work, when compared to other proposed methods [7,9] is more flexible, requiring the least amount of optimization and validation, and, when accompanied by statistical analysis, can be considered an efficient and reliable procedure for estimating the transgene copy number. It can also be used as an indication of the integrity of the DNA transferred in the transgenic lines. In fact, by measuring not only the TSWV-N, but also a second trait (the nptII in our case) present in the T-DNA, we demonstrated that in several lines the transformation cassette must have undergone some kind of modification during the integration process in the plant nuclear DNA. Therefore rearrangements during integration appear to be relatively frequent events, and may never be recognized if only a limited molecular analysis, such a single Southern blot, is performed.

Transgenic plants and DNA preparation
The transgenic tomato plants used in this work were generated via Agrobacterium tumefaciens-mediated transformation [10] with a binary plasmid containing in its T-DNA an expression cassette for the TSWV-N gene, together with a second cassette for expression of the nptII gene, as selectable marker [11]. The genomic DNA was isolated from the primary transformants (T 0 ) lines, using 1 g of leaf material, with the CTAB method, as described by [12] and not quantified. In the case of line 110-1 homozygous T 2 progeny (Hom. 110-1) was also analysed. Table 2, as the range of integers that are included in the 95% confidence interval around the ratio r line /r 1

Outline of the method
For monitoring the real-time PCR reactions we used the Bio-Rad i-Cycler System, with specific fluorescent oligonucleotide probes (TaqMan probes, PE Biosystems) [4]. This assay exploits the 5' exonuclease activity of Taq polymerase to cleave a labeled hybridization probe during the extension phase of PCR [5]. The fluorescence of the reporter molecule increases as products accumulate with each successive round of amplification. The point at which the fluorescence rises appreciably above the background has been called the threshold cycle (TC), and there is a linear relationship between the log of the starting amount of a template and its TC during real-time PCR. Given known starting amounts of the target nucleic acid, a standard curve can be constructed by plotting the log of starting amounts versus the corresponding TCs. This standard curve can then be used to determine the starting amount for each unknown template based on its TC and the efficiency of reaction.
For the purposes of our experiments we used a standard DNA stock solution extracted from one of the transgenic tomato lines. The DNA concentration of this solution was approximately 300 ng/µl (estimated by UV-spectrophotometry). However, since this measurement is not precise per se, and, in the case of relative quantitation, this data is not relevant, we preferred to use arbitrary units (a.u.) in this work. From the standard DNA stock solution accurate three-fold serial dilutions were prepared and utilized to obtain the standard curves necessary for relative quantification of an endogenous gene and two transgenes.
For quantitation normalized to an endogenous control, standard curves are prepared for both the transgene and the endogenous gene. For each tomato line to be tested (experimental sample), the amount of transgene and endogenous gene is determined from the appropriate standard curve. Then the amount of transgene is divided by the amount of endogenous gene and a normalized transgene value is obtained (r line ).

Primers and probes
Three systems were developed, the first for the apx tomato endogenous gene to quantitate tomato DNA; the others for the transgenes TSWV-N and nptII. Primers and Taq-Man probes were designed on the basis of sequences present in the GenBank database. The sequences and sizes of amplicons are detailed in Table 5. The 5' and 3' ends of the probes were labelled with fluorescent dyes FAM (6carboxyfluorescein, excitation wavelength = 494 nm, emission wavelength = 521 nm) and TAMRA (6-carboxytetramethyl-rhodamine), respectively. All primers and probes were synthesized by Eurogentec (Belgium).

Real-time PCR reactions
The real-time PCR reactions were performed in the iCycler iQ Real-Time PCR Detection System (Bio-Rad Laboratories, USA) and were carried out in 96-well reaction plates. PCR reactions consisted of 1 × Platinum Quantitative PCR SuperMix UDG (Life Technologies-Invitrogen) which contains dUTP and the enzyme uracyl-N-glycosilase (UNG) to prevent contamination deriving from previous PCR reactions, 2 µM of specific TaqMan probe, specific The cycling parameters used were as follows: one cycle at 50°C for 3 min for activation of UNG, one cycle a 95°C for 5 min for DNA polymerase activation, and 45 cycles of 95°C for 15 sec (denaturation) and 60°C for 1 min (annealing and extension). All reactions were run in triplicate.

Optimization of primer concentrations
For each primer pair, the primer concentrations were optimized in preliminary experiments, to account for unpredictable differences in annealing efficiency. PCR reactions were run with different combinations of primer concentrations. All nine combinations of 75, 150 and 300 nM (final concentrations) were tested for the TSWV-N and the nptII systems. Similarly, all combinations of 150, 300 and 600 nM were tested for the apx system.
For each system, the lowest concentration of forward and reverse primers giving a high endpoint fluorescence and low TC value, was chosen as the optimal primer concentrations: 300 nM of Q-TSWVN-492(+) and 150 nM of Q-TSWVN-570(-) for N-TSWV system; 150 nM each primer for nptII system; 600 nM each primer for the apx system.

Calculation of copy number and statistical analysis
For each line and each gene we had 6 evaluations of the starting quantity, except for the TSWV-N of line 97-1 for which we had only 5 evaluations, since one experimental point was obviously flawed. The uncertainties on the starting quantities, corresponding to 95% confidence interval, were evaluated by using the t-distribution with 5 (or 4) degrees of freedom. From the starting quantities we constructed, for each line and for the two transgenes, the ratio: Such ratios are proportional to the copy number of the transgene, since the endogenous gene is present in one copy.
To determine the copy number for each line, a possibility would be to choose a transgenic line whose copy number is known to be one as the calibrator; the r line ratio for the calibrator line (r cal ) would then be associated with one copy of transgene, therefore the copy numbers for the other lines would be determined as r line /r cal . However this procedure is likely to produce biased results, since any fluctuation in the determination of the starting quantities for the calibrator line would affect the copy number for all the other lines. Therefore we chose a different strategy, based on the idea of constructing a "virtual calibrator" which takes into account all the available lines. Our aim is to determine a value r 1 , corresponding to copy number 1, in such a way that the copy numbers determined for all the lines are as close to integers as possible: this value will be used instead of the calibrator value r cal in determining the copy numbers of the various lines. For each possible value of r 1 we define the quantity F(r 1 ) = ∑ lines [r line /r 1 -N(r line /r 1 )] 2 / (δr line ) 2  where N(r line /r 1 ) is the nearest integer to r line /r 1 .
F(r 1 ) gives a measure of how distant the determined copy numbers are from integer numbers, if r 1 is chosen as representing copy number 1. The denominator ensures that the lines that were measured with highest accuracy weigh more in the construction of the virtual calibrator. The best value of r 1 is the one for which the quantity F(r 1 ) reaches a minimum, meaning that the determined copy numbers are as close to integers as possible. In practice one should start with a value of r 1 higher than all the measured values of r line and gradually decrease r 1 until the first local minimum is found (if one were to explore still lower values of r 1 , more local minima would be found, which however should be discarded as they correspond to fractional copy numbers).
The values of r 1 were 0.31 for nptII and 0.21 for TSWV-N transgenes, respectively. Once r 1 has been determined, the copy number for each line is determined as r line /r 1 . In this way the copy number is determined as a real number: as an estimate of the actual, integer copy number we quote the range of integers that are included in the 95% confidence interval around the ratio r line /r 1 . For two lines in the case of the nptII transgene there is actually no integer included in such interval, so we quote as our estimate of the copy number the nearest integer to r line /r 1 .

Authors' contributions
GM carried out all real-time PCR experiments and edited the results. PP designed and performed the statistical analysis of the data. AMV carried out molecular analysis. GPA was involved in data analysis and was responsible for the coordination of the study. All authors participated in the design of the experiments, read and approved the final manuscript.