- Methodology article
- Open Access
A rapid and inexpensive labeling method for microarray gene expression analysis
© Ouellet et al; licensee BioMed Central Ltd. 2009
Received: 21 May 2009
Accepted: 25 November 2009
Published: 25 November 2009
Global gene expression profiling by DNA microarrays is an invaluable tool in biological research. However, existing labeling methods are time consuming and costly and therefore often limit the scale of microarray experiments and sample throughput. Here we introduce a new, fast, inexpensive method for direct random-primed fluorescent labeling of eukaryotic cDNA for gene expression analysis and compare the results obtained on the NimbleGen microarray platform with two other widely-used labeling methods, namely the NimbleGen-recommended double-stranded cDNA protocol and the indirect (aminoallyl) method.
Two total RNA samples were labeled with each method and hybridized to NimbleGen expression arrays. Although all methods tested here provided similar global results and biological conclusions, the new direct random-primed cDNA labeling method provided slightly better correlation between replicates compared to the other methods and thus increased ability to find statistically significant differentially expressed genes.
The new direct random-primed cDNA labeling method introduced here is suitable for gene expression microarrays and provides a rapid, inexpensive alternative to existing methods. Using NimbleGen microarrays, the method produced excellent results comparable to those obtained with other methods. However, the simplicity and cost-effectiveness of the new method allows for increased sample throughput in microarray experiments and makes the process amenable to automation with a relatively simple liquid handling system.
DNA microarrays allow global profiling of nucleic acid sequences and have become an important and ubiquitous tool in biological and biomedical research. Although many applications of DNA microarrays have been developed in the past decade [1, 2], differential gene expression profiling remains the most widely used application of this technology. Improvements in microarray design now allow rapid fabrication of custom microarrays, representation of an increasingly large number of features on a single glass slide and hybridization of multiple samples on physically separated arrays on the same slide. Robots designed specifically for DNA and RNA extraction are also commercially available now and can considerably reduce the hands-on time required for RNA preparation for microarray studies. Although identification of the most biologically relevant information from a microarray experiment and interpretation of this information in a biological context can be challenging, methods and tools for microarray data analysis have become more widely available and easy to use, and are now streamlining the first step of data analysis. However, the sample labeling procedure remains a rate-limiting step in high throughput microarray workflows.
Several methods to fluorescently label cDNA for gene expression have been developed over the years (reviewed in  and ). The first method introduced was the direct incorporation of fluorophore-conjugated nucleotides during reverse-transcription (RT). However, this method suffered from lower cDNA yields and significant dye bias (in two-color experiments) due to steric hindrance of the large fluorescent moieties attached to the labelled nucleotides. An indirect method of cDNA labeling, where modified (i.e. aminoallyl) nucleotides are incorporated into the cDNA and chemically coupled with the fluorescent dye post RT, was developed to overcome these shortcomings. This indirect method provided increased dye incorporation and mitigated dye bias, and has become a benchmark for microarray sample labeling, especially in dual labeling experiments. However, this method increased the sample preparation time and cost significantly. Other "indirect" labeling methods were also developed, mainly aimed at increasing specific fluorescence of the labelled product (and conversely, permitting the use of lower amounts of starting material) (e.g. DNA dendrimers), but these methods are still not widely used. Instead, template (RNA) amplification methods, mostly based on an early in vitro transcription method , coupled with traditional downstream labeling methods, are more broadly adopted when the amount of available RNA is limited, notably because of the efficiency and robustness of the process, as well as the great flexibility it provides regarding the amount of input RNA needed. More recently, NimbleGen introduced a new labeling method based on double-stranded cDNA synthesis followed by labeling with a DNA polymerase by extension of 5'-labeled random primers . This method is very robust in that the yield of each step is excellent and it produces an abundance of labeled material. However it is costly and requires the most time to perform.
We sought a method for fluorescently labeling cDNA for microarray analysis that would be rapid to perform, limit the need for manual handling and reduce the cost significantly when the RNA input is not limiting. In this study we demonstrate a new one-step labeling method--the direct, random-primed cDNA labeling method (hereafter referred to as the direct random method), based on the elongation of 5'-labeled random DNA nonamers during reverse transcription of eukaryotic total RNA. We demonstrate the suitability of our method for gene expression analysis by comparing results with those obtained using the indirect and the NimbleGen-recommended ds-cDNA protocols.
Results and discussion
Overview of labeling methods, cDNA yield and dye incorporation
cDNA yield, dye incorporation and amount of material used for hybridization (n= 6 for each method).
Labeling Yield (avg. ± S.D.)
(pmol dye/μg cDNA)
25.8 ± 0.8
463 ± 66
18 ± 3
0.33 ± 0.06
33 ± 5
105 ± 26
7.7 ± 0.6
85 ± 6
11.1 ± 0.5
Concordance of microarray results
Despite the lower dye incorporation in the cDNA using our direct random method, we decided to hybridize the samples to NimbleGen 4-plex expression microarrays. For each labeling method, different amounts of labeled cDNA were hybridized to the arrays (Table 1). Since NimbleGen recommends hybridization of 4 μg of labeled ds-cDNA, we used this quantity for the ds-cDNA method. In the absence of guidelines for the other two labeling methods, we hybridized an amount of cDNA consistent with the respective yield of each method (Table 1). Visual inspection of the resulting slide images revealed differences in the global fluorescence intensity of individual arrays (not shown), the brightest arrays being achieved with the ds-cDNA method whereas the other two methods produced a similar but slightly lower global fluorescence. To compensate for these differences, the arrays were scanned independently in order to adjust the photomultiplier tube (PMT) gain for each array as recommended .
Average pair-wise correlation coefficients (± S.D.) of normalized intensities of replicate arrays within a labeling method (n = 6) and across methods (n = 18).
0.995 ± 0.001
0.934 ± 0.006
0.990 ± 0.004
0.62 ± 0.02
0.55 ± 0.02
0.987 ± 0.002
Number of statistically significant differentially expressed genes at different confidence intervals.
In order to analyze comparable numbers of differentially expressed genes for the three methods, we chose a 95% confidence interval for the data from the indirect and ds-cDNA methods and 99% confidence interval for the direct random (Table 3 and Figure 2B). Using these criteria, a similar proportion (53-56%) of differentially expressed genes found by any method was also found by other methods (Figure 2B). Overlap between the methods increases substantially when comparing the genes with the largest fold-change values (> 2), most of which were found by more than one method (Figure 2C).
The correlation of fold-change values across methods is arguably the best way to effectively compare methods because it plays a major role in the selection of gene lists and in the interpretation of the results in a biological context. Furthermore, since each method is very reproducible, it would be expected that the differences between the cDNA populations produced by the each type of labeling would be similar in both control and experimental samples and thus produce similar fold-change values.
Number of differentially expressed genes in selected GO categories.
(99% - 126 genes)
(95% - 157 genes)
(95% - 126 genes)
Response to stress
Response to chemical stimulus
Comparison with other gene expression technologies
Fold-change data comparison between different microarray labeling methods and qPCR for selected genes.
YNL072W (RNH201; Ref1)
YLR185W (RPL37a; Ref2)
Fold-change data comparison between different microarray labeling methods and the nCounter technology for select genes.
It has been shown that the fold-change values obtained in qPCR are highly dependent on the method used , and despite our efforts to use methods considered to be the most accurate, specific genes may still have unpredictable biases making comparison across methods difficult. For example, two genes (RPN4 and PDR3) present in both the qPCR and the nCounter data sets showed relatively different fold-change values with the two methods (Tables 5 and 6). In the absence of known or expected fold-change values in this data set, it is not possible to assign greater accuracy to any of the methods. Furthermore, both our qPCR and nCounter datasets are very small and have been chosen arbitrarily, and no method has shown a clear superiority over the others in estimating fold-change values. However, all three methods broadly produced similar, reproducible results and would be considered suitable sample preparation protocols for microarray workflows.
The goal of this work was to introduce a labeling method that is comparable to currently used protocols but reduces sample cost and labeling time. Despite lower dye incorporation and global fluorescence of the array, the new direct random method provided excellent reproducibility across replicates, possibly because of the minimal manipulations. In order to present a generally useful protocol, no attempts were made to try to optimize different parameters in the labeling protocol or the amount of labelled cDNA hybridized to the array. We used a typical number of arrays for a given sample comparison in a large scale screening experiment. Slightly different results could be expected with different source materials, microarray platforms or hybridization conditions and optimization of certain parameters or larger number of replicates may be beneficial for particular systems. However, the successful use of the direct random method with eukaryotic RNA samples suggests that this method would be universally applicable independent of the source of RNA. Furthermore, the method could be adapted for samples of limited abundance such as fixed sections, sorted cells or environmental samples, provided that an RNA amplification step is performed before the labeling. The simplicity of our method also makes it amenable to automation using a relatively simple liquid handling robot for very high throughput microarray applications. In this scheme, 96 RNA samples could be reverse-transcribed, labeled and purified at once, and hybridized to 8 NimbleGen, 12-plex, microarray slides in a single day by one person.
Biological material and RNA extraction
The S. cerevisiae strains used in this study are engineered strains EPY330 and EPY338 described previously . Briefly, three independent colonies for each strain were pre-cultured in SD-His-Met-Leu medium and used to inoculate 5 ml of YPG medium  for induction at 30°C in culture tubes. Cells were harvested after 24 h of induction by quick centrifugation and immediately frozen in liquid nitrogen, disrupted with glass beads in a bead-beater and total RNA was extracted using the RNeasy Mini kit (QIAGEN), including the on-column DNAse treatment. RNA was quantified by spectrophotometry with a Nanodrop ND-1000 (Thermo Scientific) and its integrity verified on a 2100 Bioanalyzer (Agilent). Equal amounts of RNA from each replicate culture were pooled for each strain and used for microarray labeling and hybridization, qPCR and the nCounter analysis. All gene expression ratios in this paper are expressed as EPY330/EPY338.
For each labeling reaction, 10 μg of total RNA was used as starting material regardless of the labeling method. For the new direct random method, 7 μg of 5'Cy3 random nonamers (TriLink Biotechnologies, San Diego, CA) were added to the total RNA for a volume of 18.5 μl, heat denatured at 70°C for 5 minutes and placed on ice immediately. The remainder of the RT reaction components (6 μl of 5× First-Strand buffer, 1.5 μl of 0.1 M DTT, 1 μl of 10 mM dNTPs, 2 μl of 400 U/μl SuperScriptIII (Invitrogen) and 1 μl of 40 U/μl RNAseOut) were added and the reaction incubated at 25°C for 5 minutes and at 42°C for 3 h. Template RNA was chemically hydrolysed by addition of 1 volume of a 200 mM NaOH, 20 mM EDTA solution and incubation at 65°C for 10 minutes. The hydrolysis reaction was neutralized with 1 volume of 1 M HEPES, pH 7.0 and the labeled cDNA purified on a Qiaquick column (QIAGEN) following the manufacturer's recommendations for PCR purification. For the ds-cDNA protocol, the labeling method was carried out as recommended by NimbleGen , except that the components for the ds-cDNA synthesis were purchased separately and SuperScriptIII was used for the first strand cDNA synthesis. For the indirect method, the SuperScript Plus Indirect cDNA Labeling System (Invitrogen) was used with Alexa Fluor 555 reactive dye (Invitrogen) following the manufacturer's recommendations, except for the chemical hydrolysis of RNA and cDNA purifications, which were carried out as described above for the direct random method with the exception that the kit wash buffer in the first cDNA purification was replaced with 80% ethanol.
cDNA yields and dye incorporation were obtained with the Nanodrop ND-1000, using a factor of 37 for ss-cDNA or a factor of 50 for ds-cDNA. Amounts of cDNA to be hybridized to each array (Table 1) were aliquoted and dried in a SpeedVac (Thermo Scientific). NimbleGen S. cerevisiae 4-plex expression microarrays (cat. # A6186-00-01) were used, and targets labeled with the different methods were randomly distributed on 5 microarray slides. Hybridization on a 12-bay NimbleGen Hybridization System and array washes were performed as recommended by NimbleGen . Individual array images were acquired independently using a GenePix Professional 4200A scanner (Axon Instruments), adjusting the PMT gain for each image as recommended . Image analysis was performed with the NimbleScan software (Nimblegen), and feature intensities were exported as .pair files. ArrayStar 3.0 (DNASTAR, Madison, WI) was used for probe summarization and normalization (RMA algorithm, quantile normalization), statistical analysis of differentially expressed genes (Student's t-test with Benjamini-Hochberg false discovery rate correction) and gene ontology analysis. The entire microarray data set is available at the Gene Expression Omnibus (accession GSE15816).
Other gene expression measurements
where is E target the average PCR efficiency for that target amplicon across all reactions, E reference is the average efficiency of the two reference genes across all replicates, Ct Target is the CT obtained for that target gene in a particular replicate, and Ct reference is the average CT for the reference genes across all the replicates in that sample. The values N for each group of sample replicates were submitted to a Student's t-test (2-tailed, independent samples with equal variance) to obtain a p-value.
qPCR primers used in this study.
This work was part of the DOE Joint BioEnergy Institute http://www.jbei.org supported by the U. S. Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U. S. Department of Energy.
- Peeters JK, Spek Van der PJ: Growing applications and advancements in microarray technology and analysis tools. Cell Biochem Biophys. 2005, 43 (1): 149-166. 10.1385/CBB:43:1:149.View ArticleGoogle Scholar
- Trevino V, Falciani F, Barrera-Saldana HA: DNA microarrays: a powerful genomic tool for biomedical and clinical research. Mol Med. 2007, 13 (9-10): 527-541. 10.2119/2006-00107.Trevino.View ArticleGoogle Scholar
- Brownstein M: Sample Labeling: An Overview. Methods in Enzymology. 2006, 410: 222-237. 10.1016/S0076-6879(06)10011-7.View ArticleGoogle Scholar
- Do JH, Choi D-K: cDNA Labeling Strategies for Microarrays Using Fluorescent Dyes. Eng Life Sci. 2007, 7 (1): 26-34. 10.1002/elsc.200620169.View ArticleGoogle Scholar
- Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD, Eberwine JH: Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci USA. 1990, 87 (5): 1663-1667. 10.1073/pnas.87.5.1663.View ArticleGoogle Scholar
- Roche-NimbleGen: NimbleGen Arrays User's Guide - Gene Expression Analysis v.3.0. 2008Google Scholar
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhong S, Zong Y, Slikker W: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24 (9): 1151-1161. 10.1038/nbt1239.View ArticleGoogle Scholar
- Chen JJ, Hsueh HM, Delongchamp RR, Lin CJ, Tsai CA: Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data. BMC Bioinformatics. 2007, 8: 412-10.1186/1471-2105-8-412.View ArticleGoogle Scholar
- Ro DK, Ouellet M, Paradise EM, Burd H, Eng D, Paddon CJ, Newman JD, Keasling JD: Induction of multiple pleiotropic drug resistance genes in yeast engineered to produce an increased level of anti-malarial drug precursor, artemisinic acid. BMC Biotechnol. 2008, 8 (1): 83-10.1186/1472-6750-8-83.View ArticleGoogle Scholar
- Arikawa E, Sun Y, Wang J, Zhou Q, Ning B, Dial SL, Guo L, Yang J: Cross-platform comparison of SYBR Green real-time PCR with TaqMan PCR, microarrays and other gene expression measurement technologies evaluated in the MicroArray Quality Control (MAQC) study. BMC Genomics. 2008, 9: 328-10.1186/1471-2164-9-328.View ArticleGoogle Scholar
- Geiss GK, Bumgarner RE, Birditt B, Dahl T, Dowidar N, Dunaway DL, Fell HP, Ferree S, George RD, Grogan T, James JJ, Maysuria M, Mitton JD, Oliveri P, Osborn JL, Peng T, Ratcliffe AL, Webster PJ, Davidson EH, Hood L, Dimitrov K: Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol. 2008, 26 (3): 317-325. 10.1038/nbt1385.View ArticleGoogle Scholar
- Skern R, Frost P, Nilsen F: Relative transcript quantification by quantitative PCR: roughly right or precisely wrong?. BMC Mol Biol. 2005, 6 (1): 10-10.1186/1471-2199-6-10.View ArticleGoogle Scholar
- Ramakers C, Ruijter JM, Deprez RH, Moorman AF: Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci Lett. 2003, 339 (1): 62-66. 10.1016/S0304-3940(02)01423-4.View ArticleGoogle Scholar
- Karlen Y, McNair A, Perseguers S, Mazza C, Mermod N: Statistical significance of quantitative PCR. BMC Bioinformatics. 2007, 8: 10.1186/1471-2105-8-131.Google Scholar
- Pfaffl MW: A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 2001, 29 (9): e45-10.1093/nar/29.9.e45.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.