Evaluation of five different cDNA labeling methods for microarrays using spike controls

Background Several different cDNA labeling methods have been developed for microarray based gene expression analysis. We have examined the accuracy and reproducibility of such five commercially available methods in detection of predetermined ratio values from target spike mRNAs (A. thaliana) in a background of total RNA. The five different labeling methods were: direct labeling (CyScribe), indirect labeling (FairPlay™ – aminoallyl), two protocols with dendrimer technology (3DNA® Array 50™ and 3DNA® submicro™), and hapten-antibody enzymatic labeling (Micromax™ TSA™). Ten spike controls were mixed to give expected Cy5/Cy3 ratios in the range 0.125 to 6.0. The amounts of total RNA used in the labeling reactions ranged from 5 – 50 μg. Results The 3DNA array 50 and CyScribe labeling methods performed best with respect to relative deviation from the expected values (16% and 17% respectively). These two methods also displayed the best overall accuracy and reproducibility. The FairPlay method had the lowest total experimental variation (22%), but the estimated values were consistently higher than the expected values (36%). TSA had both the largest experimental variation and the largest deviation from the expected values (45% and 48% respectively). Conclusion We demonstrate the usefulness of spike controls in validation and comparison of cDNA labeling methods for microarray experiments.


Background
High-throughput global gene expression analysis with cDNA-and oligonucleotide-based microarrays has become a common research tool [1,2]. Unfortunately, the method still suffers from inadequate precision due to the many sources of variation during the experimental process [3][4][5]. Some important parameters to ensure a reliable cDNA microarray experiment are: 1) the quality of the glass-slide, 2) the quality and quantity of the probes (e.g. PCR-products) printed on the glass-slide, 3) the quality and quantity of the RNA samples, 4) the cDNA labeling method, 5) the hybridization protocol, and 6) the scanning procedure. Many efforts have been made to optimize and standardize each of these steps [6][7][8][9][10][11][12][13][14][15][16], but there are still a limited number of data sets describing all methods and strategies in use, especially regarding the labeling of cDNA target samples. Recently the reproducibility, sensitivity and accuracy of a selection of different labeling methods in cDNA microarray hybridization have been compared [13][14][15][16]. However, none of these studies have used external mRNA standards (spikes) with predetermined ratio distribution in evaluation of accuracy and reproducibility of the different methods.
In this study, we have added various amounts of 10 different spike mRNAs (Arabidopsis thaliana) in two samples of total RNA. The ratio data generated from these spikes were used to evaluate and compare five different commercially available cDNA labeling methods.

Results and discussions
We have used an approach based on a series of external standards (spikes) to evaluate the reproducibility and accuracy of five commercially available cDNA labeling methods: direct labeling (CyScribe), indirect labeling (FairPlay), two protocols with dendrimer technology: 3DNA Array 50 (3DNA50) and 3DNA submicro (3DNA), and hapten-antibody enzymatic labeling (TSA). Predefined amounts of 10 exogenous A. thaliana mRNAs were added to two rat BT4C total-RNA samples (from two different treatments of cells), resulting in known ratio distribution for the spikes (range: 0.125 -6.0; See Methods).
The observed ratios of the 10 spikes (calculated as MMR = median of medians of ratios) ( Table 1) showed that spikes with ratios below 1.0 were best reproduced with the TSA method, whereas FairPlay showed the largest deviations from the expected values for these spikes. For Spike 1, only CyScribe showed an observed value close to the expected 1.0. The other four methods produced higher values than 1.0. The TSA method showed the largest deviations from expected values for spikes with expected ratios in the range 2.0 -6.0 ( Table 1). The between-array variation with TSA was also highest for these large ratio-spikes.
In summary, the overall relative deviations from the expected ratios (Table 2) showed that CyScribe, 3DNA50 and 3DNA had the lowest values (16%, 17% and 24% respectively), while both TSA and FairPlay showed the largest relative deviations (48% and 36% respectively).
We calculated the median total coefficient of variation of ratios (CV) over the 10 spikes in each method as seen in Table 2. The FairPlay method showed the lowest total experimental CV (22%) followed by 3DNA50 and CyScribe (26% and 38% respectively). The TSA and 3DNA methods showed the largest total experimental variations (45%). The total variability was decomposed into variability between arrays and variability within array using a one-way analysis of variance (see Methods). The betweenarray variations were almost two times higher than the within-array variations for all of the five methods, except for the 3DNA method ( Table 2).
A combined evaluation of accuracy and reproducibility was studied using the parameter relative accuracy and reproducibility (RAR; See Methods) ( Table 2). 3DNA50 and CyScribe showed the lowest RAR (0.10 and 0.17 respectively), whereas methods using low amounts of starting RNA (3DNA and especially TSA), showed high RAR values (0.28 and 0.68 respectively).
Shrinkage of relative expression ratios in microarrays, especially for the 3DNA method, has previously been reported by several investigators [15,16]. We did not observe shrinkage of the spike ratios for the 3DNA method, although we detected saturation at the high end of observed ratios for this method. Experiments giving a more accurate evaluation of shrinkage should presumably include spikes with even larger expected ratios than presented here.
The relatively high CV values seen with the TSA method could be a result of high and non-uniform background fluorescence that was seen for all four replicate hybridizations produced with this method (data not shown). High background levels with the TSA method were also reported by Richter and co-workers [15]. The 3DNA method produced arrays with the lowest signal intensities, which may in turn explain the large experimental variation that was also observed with this method.
The RAR values calculated in this study could indicate a positive correlation between RNA quantity in the labeling reactions and the accuracy and reproducibility of the labeling method. In the study conducted by Ritcher et al., both direct and indirect methods were shown to be more reliable than 3DNA and TSA methods when compared to results with Nothern blots. Manduchi and colleagues [16] reported similar observations regarding the overall performance of the direct, indirect, and 3DNA methods.

Conclusions
In conclusion, the 3DNA50 and CyScribe methods showed the best overall performance. The FairPlay method had the lowest experimental variation, but showed consistently higher values than the expected values. TSA had both the largest experimental variation and the largest deviation from the expected values.
When the amount of starting RNA was not a limitation, we showed that all of the three labeling methods, 3DNA50, CyScribe, or FairPlay, had comparable performances. Using small quantities of total RNA as template, the 3DNA method was the better of the two methods analyzed (i.e. 3DNA and TSA). However, as the 3DNA method showed considerable experimental variation, we therefore suggest that researchers also look into labeling methods other than the two presented here when the amount of input RNA is small. The use of amino C6dTmodified random hexamers to prime cDNA synthesis in conjunction with aminoallyl dUTP [17], or RNA amplification methods [18,19] could be alternatives. The use of resonance light scattering (RLS) particles in signal detection is another promising technology, which also allows small amounts of starting RNA [20].

Rat cDNA microarrays
The rat cDNA microarrays used in this study, were printed and purchased from The Norwegian Microarray Consortium (NMC: http://www.mikromatrise.no/). In addition to the ~13800 sequence verified rat cDNA probes from Research Genetics (Huntsville, AL, USA; http://www.res gen.com/) printed in duplicates on amino silane coated slides (CMT GAPS II, Corning Life Sciences, Corning, NY), ten different cDNAs from Arabidopsis thaliana (Spot-Report™, Stratagene, La Jolla, CA, USA) were each printed 32 times on the slides.  The following cDNA labeling kits were used (µg total RNA is given in Table 1 The amount of starting RNA used for the different methods was chosen based on the recommendations from the manufacturers protocols and is shown in Table 1. All cDNA labeling reactions were performed as recommended by the manufacturers, but with the following modification:

Cell culturing and RNA isolation
The labeled cDNA samples were purified and upconcentrated using Microcon ® columns (YM-30; Millipore, Bedford, MA, USA) in all protocols except for the 3DNA Submicro Expression Array Detection Kit.

Hybridization
Identical prehybridizations were performed for all 20 microarray experiments. The arrays were incubated for 45 min in a 50 ml plastic tube containing 35 ml of prehybridization buffer (5x SSC, 0.1% SDS, 1% BSA) at 65°C, followed by washing in ddH 2 O (five times each in two separate tubes; RT), and in isopropanol (five times; RT), and then dried by centrifugation at 1000 rpm for 2 min in a microplate centrifuge.
The post hybridization treatment and washing were performed as recommended by the manufacturers for both 3DNA methods and the TSA method. The following washing was performed for the CyScribe and FairPlay methods: The slides were washed in 2x SSC, 0.1% SDS (~65°C) to remove the cover-slip, followed by three subsequent washing steps with agitation: 1x SSC (~65°C; 5 min), 0.2x SSC (RT; 5 min), 0.05x SSC (RT; 1 min), and finally spun dry by centrifugation at 1000 rpm for 2 min in a microplate centrifuge.
A total of four arrays were hybridized for each labeling method. The same quality-controlled array batch was used for all experiments and the same person did all of the hybridizations.

Scanning and data analysis
All arrays were scanned with the GenePix ® 4000B scanner (Axon Instruments Inc., Union City, CA, USA), followed by image analysis with the GenePix ® Pro 3.0 image analysis software (Axon Instruments). Median intensity of the spot and local background was then derived and transferred to the R language and environment for statistical computing and graphics http://www.r-project.org/. Filtering was performed by first excluding spots automatically flagged by the GenePix software. Then spots with background-subtracted intensities less than 200 in both channels were removed. Finally spots with signal-to-local background ratios (S/B) less than 1.5 in any of the two channels were excluded.
We normalized the log2-ratio data (spike values not included) by using print-tip group loess normalization (degree 2 and span 0.4) as described by Yang et al. [21]. The spike ratios were then adjusted by normalization factors obtained from the loess curves. This normalization procedure was equally applied to all arrays.

Statistics
Median of medians of ratios (MMR). For each spike we calculated the median of ratios within each array and then calculated the median of these median ratios from the replicate arrays, obtaining one total measure of expression ratio for each spike.
Relative deviation from the expected values (RD) represents the absolute difference between MMR and the corresponding expected ratio expressed as a percentage of the expected ratio.
Total coefficient of variation of ratios (CV) for each spike was calculated as the ratio of standard deviation (over 128 ratios; 32 pr. array times four replicate arrays) to the median instead of the mean.
Relative accuracy and reproducibility (RAR) was calculated as the sum of the squared RD and the squared total CV for each spike, representing a combined measure of accuracy and reproducibility for that spike ratio.
A one-way analysis of variance (ANOVA) was fitted to the log-ratio data for each spike in each method (32 observa-tions for each of four arrays), with array as a "treatment effect". The total variability for each spike was decomposed into variability between arrays (treatment sum of squares), and variability within array (error sum of squares). All statistical analyses were done using the R language.

Authors' contributions
AB did the processing and analysis of microarray data and drafted the manuscript. HE and VS guided data processing and participated in editing of the manuscript. RL coordinated the study, performed the practical laboratory work and edited the manuscript.