Quantifying 35 transcripts in a single tube: Model-based calibration of the GeXP RT-PCR assay

Quantitative analysis of differential gene expression is of central importance in molecular life sciences. The Gene eXpression Profiling technology (GeXP) relies on multiplex RT-PCR and subsequent capillary electrophoretic separation of the amplification products and allows to quantify the transcripts of up to approximately 35 genes with a single reaction and one dye. Here, we provide a kinetic model of primer binding and PCR product formation as the rational basis for taking and evaluating calibration curves. With the help of a purposeful designed data processing workflow supported by easy-to-use Perl scripts for calibration, data evaluation, and quality control, the calibration procedure and the model predictions were confirmed and the robustness and linearity of transcript quantification demonstrated for differentiating Physarum polycephalum plasmodial cells. We conclude that GeXP analysis is a robust, sensitive, and useful method when the transcripts of tens to few hundred genes are to be precisely quantified in a high number of samples.


Introduction
The quantitative analysis of differential gene expression is of central importance in cell, developmental, and systems biology. Established experimental approaches have specific strengthes and weaknesses and the method of choice depends on the required accuracy for quantitative detection, the available amount of RNA per sample, the number of different transcripts that are to be quantified in one sample, and the available budget [1,2]. Transcript quantification by the next generation sequencing RNAseq method, for example, allows to quantify the entire transcriptome with an accuracy depending on the number of reads analysed. However, the quantification of one sample is relatively expensive in terms of sequencing costs and requires sophisticated bioinformatic data processing [3,4]. Real-time quantitative RT-PCR on the other hand is relatively inexpensive and easy to perform but only gives the transcript concentration for upto few genes per sample. A different, appealing, and useful approach is the GeXP multiplex-RT-RCR developed by Beckman-Coulter 1 . This method relies on the simultaneous amplification of the cDNAs of a moderate number of genes in the same sample and the subsequent separation of the fluorescence-tagged amplification products by capillary electrophoresis. The number of transcripts that can be quanitfied in a single pot is limited by the number of fragments that can be thoroughly separated from each other. With the CEQ 8800 TM Genome Analysis System, an 8-capillary sequencer, at least 35 different transcirpts can be readily quantified using a single fluorescent dye [5].
The method is depicted in Fig. 1. Each transcript of interest is reverse transcribed with a gene-specific RT-primer (reverse primer) and the second strand of the cDNA is synthesized with a gene-specific forward primer. Both, the forward and the reverse primers carry additional sequence stretches at their 5'-ends that are identical for all gene-specific primer pairs used in the multiplex RT-PCR reaction. 1 The Beckman Coulter Capillary Electrophoresis product line including GeXP chemistry is now distributed and supported by AB SCIEX, Landwehrstr. 54, 64293 Darmstadt, Germany.
(www.SCIEX.com). The reverse transcription of a RNA is primed with a gene-specific reverse primer (GSrev) that carries a universal priming site at its 5'-end to obtain a first strand cDNA fused with the universal priming site. The synthesis of the second strand cDNA is primed with a gene-specific forward primer (GSfwd) that also carries a yet different sequence as universal priming site. Because of the two universal priming sites incorporated, the cDNA can be amplified with universal forward and reverse PCR primers (Ufwd, Urev, respectively) independently of the gene-specific sequence. As the universal forward primer is labelled with a fluorescence tag at its 5'-end, the PCR product can be detected after capillary electrophoretic separation. (B) Gene-specific primers are designed to yield PCR products of different lengths that can be separated by capillary electrophoresis from each other and quantified through the area of individual peaks in the elution profile (C). (C) The gene expression pattern of different samples can be compared through the cDNA amplified from a spike-in RNA that serves as an internal standard. A DNA ladder (size marker) added to the cDNA sample prior to capillary electrophoresis allows to determine the size of the PCR products. During PCR these sequences serve as specific binding sites for the two universal primers. The purpose of the universal primers is to amplify all cDNAs in the reaction mix independent of their length or sequence with equal efficiency during each cycle of the PCR. With the help of a spike-in RNA, an internal standard which is reverse transcribed together with the RNAs to be analysed, the amount of cDNA fragment amplified from each gene of interest can be normalized relative to the amount of fragment amplified from the spike-in RNA so that the relative transcript abundance for each gene becomes comparable between samples. By producing fragments of different length that are separated electrophoretically (Fig. 1), the relative abundances of the 35 chosen transcripts can be determined provided the system is calibrated appropriately. The normalized peak area for each amplification product which is determined during the electrophoretic separation can be fitted to the total RNA concentration by linear regression of the data points on a double logarithmic plot [6,7]. Here, we demonstrate that this empirical calibration procedure complies with a kinetic model of cDNA synthesis and amplification. The kinetic model predicts for a set of different transcripts a series of curves of equal slope shifted in parallel to each other. We experimentally confirm the model predictions and the validity of the model-based calibration procedure. Finally, we apply these methods to estimate the relative concentration of different transcripts of differentiation marker genes expressed in Physarum polycephalum plasmodial cells that are committed to sporulation [6], see SI Table 1. We also provide a set of Perl scripts as components of an integrated workflow for efficient data processing and evaluation, which is demonstrated in the context of the case study.

Growth and preparation of sporulation-competent plasmodia
Plasmodia of the apogamic strain WT31 [8] which is wild-type with respect to the photocontrol of sporulation were grown for four days at 24 °C in 1.5 L of growth medium [9] in a 5 L fermentor (Minifors, Infors HT, Bottmingen, Switzerland) inoculated with 2 % of a 3.5 days old shaken culture, supplied with 1 L of air per minute, and stirred at 250 rpm with a marine propeller. Microplasmodia were harvested and applied to starvation agar plates (9 cm diameter) as described [8].
Plates were incubated at 22 °C in complete darkness for 6 days if not stated otherwise. During this time period, one multinucleate macroplasmodium develops on each plate and, by starvation, becomes competent for sporulation. The incubation temperature is critical in order to avoid unwanted spontaneous sporulation. For the induction of sporulation, competent plasmodia were exposed to a far-red light pulse ( λ ≥700 nm, 13 W/m 2 ). The far-red light was generated by Concentra Weißlicht lamps (Osram, Munich, Germany) which was passed through an Orange 478 combined to a Blue 627 plexiglass filter (Röhm, Darmstadt, Germany). After light exposure, plasmodia were returned to the dark and incubated at 22 °C until the samples were taken. At five subsequent time points, a portion of each plasmodium was harvested (approximately one sixth of its original biomass for each sample), shock-frozen in liquid nitrogen and stored at -80 °C for RNA isolation. After taking the samples, the remainder of each plasmodium was maintained overnight in the dark to reveal the developmental decision (i.e. whether or not sporulation had occurred). Control plasmodia were treated identically except that the light pulse was omitted (dark controls). All manipulations were done under sterile conditions and under dim green safe light as previously described [8].

Preparation of total RNA
For extraction of total RNA, approximately 15 mg of fresh weight material which had been taken from an individual plasmodium was used. Samples were dissolved in 1.5 ml PeqGOLD TriFastTM solution (Nr 30-2020; PeqLab, Erlangen) by disrupting the cell material in the presence of glass beads ø=0.5 mm using a Precellys 24-Dual homogenizer, (Peqlab, Erlangen, Germany) for 10 seconds at room temperature.
The homogenate was transferred to 2 ml PeqGOLD PhaseTrap tubes (PeqLab; Nr 30-0150A-01). After addition of 300 µl CHCl 3 , vigorous shaking and centrifugation of the PhaseTrap tubes, the aqueous phase containing the RNA was transferred to a fresh Eppendorf cap following the manufacturer's instructions for the entire procedure. After addition of 0.9 volumes of a 4:1 mixture of ethanol and 1 M acetic acid, the RNA was precipitated at -20 °C for at least 24 h. The precipitate was spun down for 15 min at 18,000 x g and 4 °C, the pellet washed once with 400 µl ice-cold 3 M sodium acetate and twice with ice-cold 70 % ethanol followed by centrifugation for 10 min at 18,000 x g, 4 °C after each washing step. Finally, the RNA was dissolved in RNAse-free water. For removal of contaminating genomic DNA, 90 µl of each RNA sample were mixed with 10 µl rDNase buffer and 1 µl rDNase (rDNase Set Macherey-Nagel: 740963) and incubated for 12 min at 37 °C. The total RNA was finally purified using the RNA Clean-up Kit (Macherey-Nagel: 740948.250).
Approximately 300 ng aliquots checked for integrity of the 18 S and 28 S rRNA by agarose gel electrophoresis. For subsequent steps, samples were standardized to 20 ng/µl of total RNA.

Multiplex Gene Expression Profiling (GeXP)
Multiplex RT-PCR reactions were performed to simultaneously amplify 35 transcripts and the amplification products were quantified through separation on a Beckman Darmstadt, Germany (www.SCIEX.com)). cDNA synthesis and polymerase chain reaction were performed in half of the recommended volume leaving the recommended concentrations unchanged. Twenty ng of total RNA was used for each reverse transcription (see below). Controls without template were included to check the kit components for contaminating nucleic acids and controls without reverse transcriptase were use to check for contaminating DNA in the RNA samples.

Primers used.
Forward and reverse gene-specific primers for a set of 35 genes were originally desinged by [6] to generate amplified DNA fragments with similar GC contents and melting temperature of different lengths ranging between 114 and 357 bp (SI Table 1). Each primer contained a universal priming sequence at the 5'-end and a gene-specific sequence at the 3'-end. The concentration of each gene-specific reverse primer in the pre-mixed reverse primer plex was experimentally adjusted to give a fluorescence signal for each fragment that was in the linear sensitivity range of the measurement system. The attenuated concentration of each reverse primer depended on its sequence and was adjusted to a value between 0.5 to 0.00025 µM (SI Table 1).  Calibration Curves. A calibration curve for each transcript i was established to calculate its relative concentration in the analysed samples from the corresponding normalized fluorescence signal values integrated over the peak areas of the respective amplified DNA fragments as separated on the capillary sequencer. To obtain the calibration curve, total RNA samples from dark-and light-treated plasmodia were mixed as described in the Results section to give an appropriate analytical standard. The mix was repeatedly diluted, two-fold in each step, to obtain a set of concentrations of total RNA ranging from 100 ng to 0.78 ng per reaction volume. Each sample was measured twice with two independent RT-PCR-reactions.
The relative fluorescence signal F i (which is identical to the normalized peak area NPA) as measured for each DNA fragment i was obtained by normalizing its peak area relative to the peak area of the KAN r control RNA which served as internal standard. The logarithm of the normalized mean values was plotted against the logarithm of the relative total RNA concentration as obtained in the dilution series [7].
By taking the logarithm for the base 10, a straight line in the form

Results and Discussion
Kinetic model  Obtaining comparable amounts of PCR products from transcripts of highly different abundances. (a) When samples of total RNA containing transcripts of genes A, B, and C that yield cDNAs of highly uneven concentrations, the amounts of some transcript-specific PCR products may be so low that differences between samples (e.g. induced vs. control) cannot be precisely quantified by capillary electrophoresis (as shown for transcript C). In order to allow reliable detection of the transcripts within the dynamic range of their differential regulation, the two samples (#1 and #2) are mixed (Step 1) and the concentrations of the gene-specific reverse primers (see Fig. 1) adjusted so that approximately equal amounts of cDNA fragments are obtained by RT-PCR from each of the transcripts (Step 2). Finally, the two samples, #1 and #2 are re-analysed with the adjusted primer concentrations (Step 3). Now, the x-fold change in the abundance of the transcripts of all genes can be determined due to the sufficient amount of their corresponding cDNAs (b).
the concentration of the reverse primer will not change significantly in the course of the RT reaction.
It has been shown that binding of a primer to its template during the reassociation (renaturation) of DNA follows the rate law of a second order reaction [11,12]. This suggests that the rate limiting step for the reassociation is the nucleation reaction during which few bases of one strand bind to their fitting counterparts on the other strand. The subsequent zipping reaction must occur fast as  Step 1 Step 2 Step 3 Adjusted primer concentrations compared to the nucleation event of base pairing because otherwise a first order instead of second order kinetics would be observed. This conclusion is plausible as the local concentration of fitting bases greatly increases if two strands are brought in spatial proximity and appropriate orientation as a consequence of the zipping reaction.
The concentration of the RT-primer for the transcript of each gene of interest has to be adjusted to attenuate the relative amount of DNA amplified from its RNA during the GeXP RT-PCR reaction while the concentration of the RT-primer is in large excess as compared to the reverse transcribed mRNA (see above). We accordingly assume that primer binding is the rate-limiting step of reverse transcription (RT). Once priming of the RT reaction has occurred, the polymerization reaction of cDNA synthesis is equally fast for all 1st strand cDNA species concurrently synthesized in the reaction mix.
Note that only a tiny fraction of the low molecular weight reactants is consumed during the RT reaction such that their concentrations do hardly change during the experiment. Based on the assumption of a rate-limiting bimolecular reaction of RTprimer binding to the RNA template, we expect an empirical rate law for the time- where k i, j is the primer-specific rate constant for hybridization of the primer to its complementary RNA sequence, Primer i, j ⎡ ⎣ ⎤ ⎦ is the concentration of primer j of given sequence and length, complementary to the corresponding sequence stretch of the mRNA of gene i , and n is the empirical order of the reaction. As the concentration of the primers in the RT reaction is in a sufficiently high excess over that of each gene-specific RNA (see estimation above), the primer concentrations can be approximated as remaining constant during the reaction. By defining the pseudo first where the value of ′ k i, j depends on the primer concentration that had been appropriately adjusted by the experimenter. Separation of the variables and integration over the reaction time t and the change in RNA i concentration yields In the simplest case of a first order reaction where exponentially during the reverse transcription reaction of length t starting from its Alternatively, if the reaction order is different from one (n ≠ 1) , the integration yields and hence RNA i [ ] decays according to In both cases, for n = 1 and for n ≠ 1 , the concentration cDNA i [ ] of genespecific first strand cDNA, synthesized during the enitre RT reaction is and according to eqn (8) dependent on the initial concentration of the gene i -specific Overall amplification α of each 1st strand cDNA molecule after c cycles of PCR is  (4) and (7), the concentration cDNA i amp ⎡ ⎣ ⎤ ⎦ of the gene i -specific cDNA and for n ≠ 1  (11) and (12) further predicts how the concentation of cDNA fragments depends on the initial concentration of the gene-specific reverse primer used for the RT reaction (SI Fig. 2).

Experimental validation of the calibration procedure
The kinetic model of the GeXP RT-PCR predicts that for a set of transcripts one obtains a series of parallel shifted lines when the concentration of amplified genespecific cDNA fragment is double logarithmically plotted versus the total RNA concentration used in the assay. For experimental validation of this prediction we used Physarum polycephalum total RNA isolated from plasmodial cells.
Starving plasmodial cells can be induced to sporulation by a brief pulse of farred light which is sensed by a phytochrome photoreceptor [10,13,14]. The cellular commitment to sporulation is associated to the differential expression of developmentally regulated genes [15]. Based on transcriptome data [15,16] we have down-regulated transcript, respectively. For the three pairs of genes the ratio of the normalized peak areas for the corresponding two fragments was plotted against the relative concentration of the two samples in the mixture. In the three pairs of fragments the normalized peak area for each fragment was proportional to its relative amount in the mixture (SI Fig. 3), confirming that the peak detection algorithm of the software delivered with CEQ 8800 TM capillary sequencer worked correctly.

Calibration curves for the 35 transcripts.
In order to obtain calibration curves for each of the 35 transcripts, we mixed two RNA samples (from light-induced and control cells) in a 1:1 ratio in order to obtain a sample that contained even the differentially regulated transcripts at intermediate concentration (Fig. 2). This RNA mixture was serially diluted and, after addition of spike-in RNA to each sample of the dilution series, the cDNA fragments as amplified by the GeXP multiplex-RT-PCR were separated electrophoretically. The normalized peak area determined for each cDNA fragment was plotted versus the concentration of total RNA. The data points obtained for all fragments were fitted to straight lines in the double logarithmic plot.
The slopes as obtained by the 35 fits were averaged and the best-fitting intercept for each of the 35 curves was calculated. Accordingly, this procedure gave a series of fitted curves that only differed in being shifted with respect to the abscissa (Fig. 3).
As predicted by the kinetic model, for none of the genes any systematic deviation of the data points from the fitted straight line was seen (Fig. 3). Hence, all fragments were amplified with the same efficiency by the universal primers. The parallel shift relative to the abscissa, according to the kinetic model, is due to the different absolute concentrations of the mRNAs and to the different amounts of cDNA produced during the RT reaction at the particular concentration of the RT-primer.
Noteworthy, the slopes of the curves clearly were smaller than 1 as observed in most experiments performed in our lab. A slope of 1 would be expected if the cDNA synthesis occurred with first order kinetics as assumed in eqn. (11). The kinetic model therefore suggests that the RT reaction occurs with a reaction order of n ≤ 1 (eqn (12)). The value of n observed in each particular experiment might slightly vary depending on the experimental conditions.

Efficiency of concurrent amplification of different transcripts in one
sample. Since cDNA fragments must be of different lengths in order to get separated during capillary electrophoresis and the efficiency of amplification might depend on the fragment length, we wished to confirm that the amount of cDNA amplified for any given transcript is independent of other fragments that are concurrently amplified in the same sample. Therefore we mixed the RNA sample from the control cells with the sample from the light-induced cells taking different relative amounts and estimated for each gene how the normalized peak area changed with the relative log (Normalized Peak Area) ◀ Fig. 3. Calibration curves for the transcripts of 35 genes as amplified by GeXP RT-PCR in the same tube. A sample of a total RNA pool from light-induced and control plasmodial cells (see Fig. 2) was serially diluted and the 35 cDNA fragments quantified after capillary electrophoretic separation. Each RNA concentration was analysed twice and the two resulting data points, normalized peak area (NPA) versus concentration of total RNA, displayed in a double logarithmic plot (logarithm to the base 10). All data points were fitted to a straight line of the same slope as described in Materials and Methods. For names of genes see SI Table 1. amount of the RNA from light-induced cells or from the dark controls (Fig. 4). For each fragment, the normalized peak area and hence the amount of amplified cDNA was a linear function of the mixing ratio with a slope depending on whether genes were up-regulated or down-regulated. Curves for genes that are not regulated in response to the photo-stimulus fitted to a straight line that was parallel to the abscissa (Fig. 4). Hence, the amount of amplified DNA for each gene was found to be proportional to the concentration of the corresponding mRNA in the sample.

By quantifying the light-induced expression of differentiation marker genes in
Physarum polycephalum plasmodial cells, we present a workflow including the application of Perl scripts for calibration and data evaluation.

Stimulation of cells, sample preparation, and biological replicates.
Plasmodial cells were starved for six to seven days to gain sporulation competence.  Table 1) were quantified as described above.

Data processing workflow and Perl scripts. The electrophoretic separation of the GeXP multiplex RT-PCR samples was performed in the CEQ 8800 TM Genome
Analysis System run via the so-called Control Center, a software package for operating the system and for storing the data. All further data processing steps were performed by Perl scripts as described in the Supplementary Information (see also SI Figs. 4 and 5).

Light-induced gene expression and technical replicates.
SI Fig. 4. Electrophoresis and signal processing. (A) A DNA ladder of fragments of known size is added to each sample of amplified cDNAs and fragments are separated electrophoretically on a CEQ 8800 analyser. The CEQ TM 8800 software (Control Center) saves the electropherogram as Sample Data in the database (step 1). After the run is completed, the operator calls the FRAGMENTS module of the Control Center to generate the Fragment Results, a table which assigns the peak height, the peak area, and the calculated fragment length to each peak detected above threshold (step 2) which are also stored in the database of the Control Center (step 3). All peaks for which the peak height is smaller than 0.5% of the height of the second largest peak are considered to be below threshold. The cutoff value of 0.5% as used in this study is to be specified by the user of the CEQ TM 8800 Control Center, is kept constant for all experiments, and defines the detection limit for each peak. The Peaks (Fragment Results, consisting of peak height, peak area, and fragment length) and the Electropherogram (Sample Data, fluorescence intensity versus retention time) of each sample are exported from the database of the CEQ8800 Control Center in the form of tables as tab-separated text files (steps 4 and 5, respectively). The Peaks file is lateron used for calibration or fragment quantification (see SI Fig. 5). (B) For convenient analysis, the Electropherogram and Peaks files may be transferred to a separate personal computer for further data processing. The Electropherogram file extension is renamed to change the file extension to .ceqraw and displayed in Gnuplot by running the scripts gep_renameraw and gep_rawplot, respectively. The graphical representation of each electropherogram (panel lower left; peaks of cDNA fragments in blue, peaks of the DNA size standard in red) is visually inspected to assess the quality of the separation run before the data are further numerically analyzed. The Peaks file extension is renamed by the script gep_renamefrag as well in order to specify the file type. The panel lower right is generated by the script gep_fragplot. It graphically displays the detected peaks in relation to the size ranges of the expected fragments (blue) and the size ranges of the reference fragments (of the DNA size standard; red) which both are used to automatically assign the detected peaks to the respective fragments. With the help of this plot one can assess the quality of each separation. For all fragments to be quantified, this script identifies fragment-specific peaks by assigning the largest peak from the list of fragments file that has a measured fragment length of ±0.8 nt of the expected approximate size as listed in the Fragments file for the respective fragment. The script then normalizes the peak area of each fragment to the peak area measured for the spike-in RNA.
With these values the script computes for each fragment a calibration curve by performing a linear regression of the log (normalized peak area) versus log (total RNA concentration) values (1st linear regression). The slopes obtained for all expected gene-specific fragments in the 1st linear regression are averaged and the data points for each gene-specific fragment are fitted to a straight line by a 2nd linear regression that computes the intercept obtained for the average slope. The resulting curve fit parameters of each gene-specific fragment to be analysed, are stored in the Calibration curves file. The entries in the Calibration curves file are inspected line-wise for consistency of slope and intercept values obtained by the 1st and the 2nd linear regression. (B) Fragment quantification. The curve fit parameters (mean slope and intercept obtained by the 2nd linear regression) of the calibration curves (as contained in the file Calibration curves) are used to quantify gene-specific fragments by evaluating Peaks files obtained by analysing experimental samples. For each gene-specific fragment in each sample, the script gep_calcamount calculates its relative amount and the corresponding detection limit. The detection limit may be different for each gene-specific fragment and differ between electrophoretic separations. It is estimated taking the 0.5% value of the height of the second largest peak in the electrophoretic separation of the analysed sample and the parameters (mean slope and intercept) of the 2nd linear regression of the valid calibration curve obtained for each respective fragment. For download of the relevant scripts and example data files see Supplementary Information.

Installation under Linux
To run the software on a Linux computer, retrieve the files from https://github.com/markushaas/genexpro4ceq8800 and install them by following the install instructions that are provided together with the files.

Installation under Mac OS X or Windows R
For running the Perl scripts under Mac OS X or Windows R operating systems (os), we have set up a virtual Linux machine (vm), a package, in which the Linux operating system and all files necessary for the GeXP data analysis workflow (Perl scripts, example files, and ReadMe) are put in place. Download from github is therefore not necessary. Installation of the virtual machine can be easily performed by going step by step through the following protocol: • Install "VirtualBox" from Oracle (https://www.virtualbox.org/).
• Download the vm image file we have prepared by clicking on the following link http://www.regulationsbiologie.ovgu.de/Downloads.html or by copying the link into the address bar of your browser window.
• Open "VirtualBox" and select "Import Appliance..." from the "File" menu, choose the vm image file "gep_vm.ova" you have downloaded and follow the instructions of the "VirtualBox" dialogue.
• To share a folder with your host os (for shutteling files between your os and the virtual machine), create a folder with the name ".virtualboxsharedfolder". If you are using a Windows® pc you may need to create the folder within a terminal window of your os as the file manager does not accept names beginning with a "." so execute cmd.exe and enter "mkdir .virtualboxsharedfolder".
• Right click on the "gep_vm" item that shows up in the VirtualBox window and select "Settings" and therein "Shared Folders". Click on the "Add new shared folder" icon, select the folder ".virtualboxsharedfolder", and enable "Automount". Files in this folder can be accessed by both, your host and the operating system run by the virtual machine.
• Start the vm. Depending on the BIOS installed on your computer, you may have to enable "Intel® Virtualization Technology" in the BIOS settings in order to allow you to use a vm on your pc. Refer to the manual of your hardware if necessary.
• After booting the vm open the "Introduction" file (by double clicking the icon) on the Desktop of the vm to get some help with the GeXP data analysis workflow. To adapt the resolution of the virtual screen, adjust the settings in the "view" menu of the virtual machine window.
• In case of problems with the shared folder or the screen resolution that might occur under the particular os you are working with, refer to the internet to get help.