The limits of log-ratios

Sharov, Vasily; Kwong, Ka Yin; Frank, Bryan; Chen, Emily; Hasseman, Jeremy; Gaspard, Renee; Yu, Yan; Yang, Ivana; Quackenbush, John

doi:10.1186/1472-6750-4-3

Methodology article
Open access
Published: 08 March 2004

The limits of log-ratios

Vasily Sharov¹,
Ka Yin Kwong¹,
Bryan Frank¹,
Emily Chen¹,
Jeremy Hasseman¹,
Renee Gaspard¹,
Yan Yu¹,
Ivana Yang¹ &
…
John Quackenbush^1,2,3

BMC Biotechnology volume 4, Article number: 3 (2004) Cite this article

15k Accesses
31 Citations
Metrics details

Abstract

Background

DNA microarray assays typically compare two biological samples and present the results of those comparisons gene-by-gene as the logarithm base two of the ratio of the measured expression levels for the two samples.

Results

Because of the fixed dynamic range of fluorescence and other detection systems, there is a limit to the range of comparisons that can be made using any array technology, and this must be taken into account when interpreting the results of any such analysis.

Conclusions

The dynamic range of microarray data collection systems results in limits in the comparative analyses that can be derived from such measurements and suggests that optimal results can be obtained by making measurements that avoid the boundaries of that dynamic range.

Background

DNA microarray analysis has become one of the most widely used techniques in modern molecular genetics, and the laboratory protocols that have developed in recent years have led to increasingly robust assays. The application of microarray technologies affords great opportunities for exploring patterns of gene expression and allows users to begin investigating problems ranging from deducing biological pathways to classifying patient populations.

As with all assays, the starting point for developing a microarray study is planning the comparisons that will be made, and the simplest experimental designs are based on the comparative analysis of two classes of samples, either using a series of paired case-control comparisons, or comparisons to a common reference sample, although other approaches have been described. But the fundamental question addressed using arrays is generally a comparison between paired samples to find genes that are significantly different in their patterns of expression. For the sake of the analysis presented here, we will focus on direct pair-wise comparisons between samples using spotted DNA arrays conducted as dual-labeled co-hybridization assays. However, it must be noted that the results we present here will impact other analyses including inferred relative changes derived by comparisons to a reference sample, through more complex loop designs, or from comparisons between single-color assays such as those which are commonly performed using the Affymetrix GeneChip™ or filter array platforms.

Results and Discussion

Measuring log-ratios on microarrays

Microarray experiments generally measure relative expression levels between biological samples. However, there is a fundamental limit to the changes that can be measured on an array and understanding that that these limits exist is important for analyzing microarray experiments. This observation depends fundamentally on the manner in which most microarray scanners work. Following hybridization of spectrally distinguishable labeled targets to the arrayed probes on a microarray, the surface of the slide is generally interrogated using one or more lasers, each tuned to excite a particular fluorescent label. The fluorescent light emitted from the surface is collected through an optical system, generally spectrally separated, and focused on a photon detector, usually a photomultiplier tube (PMT). PMTs have a glass photocathode window coated by one or more alkali metals that has a high probability of converting an incoming photon to an electron. The electron emitted from the window is attracted to an alkali metal coated electrode which is maintained at a positive charge. When the initial electron strikes the electrode, it normally releases a number of additional electrons. These are attracted to a series of coated electrodes, each maintained at a slightly higher voltage than the previous, in effect multiplying the number of electrons released at each subsequent electrode. After a series of these amplification steps, the electrons are collected by a final electrode and the output current is measured. This output current depends on the intensity of the light (i.e. the number of photons) and the total voltage maintained across the PMT – a higher voltage accelerates electrons more in each step, producing a greater final current. It should be noted that this process is also stochastic, so that each photon produces a number of electrons which can be modeled as a Gaussian distribution with mean μ and standard deviation σ. It should be noted that as the light intensity increases, the number of photons increases and this has an effect on the distribution, with N photons producing approximately Nμ final electrons with a standard deviation of . This explains, in part, the reason why the variation in signal intensity, and consequently derived measurements such as log-ratios are more uncertain for genes expressed at lower levels. Finally, the signal from the PMT is converted to a digital signal using an analog-to-digital converter (ADC). Typical array scanners use 16-bit ADCs, giving the instruments an output range of 0 to 65535 (2¹⁶-1) relative fluorescence units (RFUs) for each pixel. The reported intensity values for each spot on the array varies between research groups and software used for image processing. Common measures of expression include background subtracted mean or median pixel values measures for each arrayed gene. For the purposes of the analysis presented here, we will use the background-subtracted mean pixel values reported by the TIGR Spotfinder image analysis software [1].

Microarray assays are often used to compare expression levels between paired samples and for a variety of reasons, these comparisons are typically expressed for each gene as the logarithm base 2 of the ratio of the (background subtracted) fluorescent signals measured from each labeled sample [log₂(R/G)]; we refer to these as log-ratios. Because the fluorescent dyes used in most microarray assays have slightly different efficiencies for light emission, the detection efficiencies of the phototubes has some wavelength dependence and hence differ for the different dyes, and because the PMTs exhibit nonlinearities at high and low intensities, the log-ratios measured often exhibit some systematic, intensity-dependent variation. This systematic error is most easily visualized using a Ratio-Intensity (RI) plot ([2, 3]; also called an MA plot by Speed and colleagues) in which the log-ratio for each spot is plotted as a function of one-half the logarithm of the product of the measured intensity , which is equivalent to the logarithm of the geometric mean of the intensity for that gene, a measure of the relative expression level of a particular gene. The shape of the distribution one observes in an R-I plot depends in a fundamental way on the experimental design one chooses as that defines the comparisons that are made. For closely related samples where one expects gene expression to be highly similar, the distribution of log-ratio values is broad at lower intensities, reflecting the greater relative uncertainty as one approaches the detection limits in one or both channels, while it narrows at higher expression levels (Figure 1A,1B); for biologically diverse samples the R-I plot can present a very different profile (Figure 1C,1D,1E,1F).

The R-I plot can also reveal some of the limitations of using log-ratios as a measure of expression. As described previously, the 16-bit ADCs in microarray scanners limit the maximum intensity that can be measured in both red and green channels on an array such that both log₂(R) and log₂(G) values range independently between a minimum of 0 and a maximum of 16. One can visualize this as a square box in the a plot of log₂(R) versus log₂(G), or as a diamond-shaped area in an R-I plot (Figure 2A). This relationship is due to the fact that the R-I plot is essentially a 45° (π/4) rotation (and slight rescaling) of the log-intensity plot, where the square represents the limits defined by each of the two independent fluorescence measurements (Figure 2B).

Most microarray image analysis software performs a background subtraction and uses other methods to avoid saturation of pixels, the reported fluorescence signals normally do not reach the absolute limit of detection. The background-subtracted data we use for analysis exhibit that effect in hybridization assays where the fluorescence signal is particularly strong (Figure 1B,1D,1F). Similar effects can be seen as the signal intensity decreases toward the lower limit, where discrete integer values assigned to gene expressed at low levels appear as diagonal "whiskers" in the R-I plot (Figure 3); this often arises as a result of setting expression values below some threshold to a minimal value, a process referred to as "flooring."

It is important to note that this effect limits the dynamic range of "fold-change" (equivalent to the log-ratio) measurements on arrays, particularly as the measured intensities approach either the minimum or maximum detectable levels accessible on a particular array scanner. Furthermore, it is important to note that these limits are not unique to dual-color detection techniques. Comparisons made using single color microarrays are also limited by the dynamic range of the individual measurements and fold-change estimates in comparisons demonstrate exactly the same type of artifact.

Conclusions

This simple analysis presented here suggests a possible limitation on the use of fold-change measurements derived from microarrays and argues for the use of R-I plots as a means of detecting possible deviations from the dynamic range of the assay. Further, these results suggest that rather than try to maximize signal on the fluorescent images from the array, a better approach would be to target background-subtracted fluorescent intensities to the middle of the range where the dynamic range for fold change measurements is maximized, or a

of 8. However, this corresponds to an average expression measurement of only 256 RFUs, which on most arrays is uncomfortably close to background. In practice, an average of 10 to 12 (1024 to 4096) strikes a good balance between intensities that are too close to background and those that approach the limits of the dynamic range of the assay. While the raw images from these arrays may not provide as pretty a picture of the hybridization assay, they are more likely to provide useful data that can be validated.

References

Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: A Free, Open Source System for Microarray Data Management and Analysis. Biotechniques. 2003, 374-378.
Google Scholar
Yang IV, Chen E, Hasseman JP, Liang W, Frank BC, Wang S, Sharov V, Saeed AI, White J, Li J, Lee NH, Yeatman TJ, Quackenbush J: Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol. 2002, 3: research0062-
Google Scholar
Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30: e15-10.1093/nar/30.4.e15.
Article Google Scholar

Download references

Acknowledgments

This work was supported by grants from the National Heart, Lung, Blood Institute (NIH 1 U01 HL66580-01 and NIH-1 R33 HL3712-01), the US National Cancer Institute (NIH-U01-CA8552-01A1), and the National Science Foundation NSF-DBI9975920 and NSF-DBI-0177281).

Author information

Authors and Affiliations

The Institute for Genomic Research, Rockville, MD, USA
Vasily Sharov, Ka Yin Kwong, Bryan Frank, Emily Chen, Jeremy Hasseman, Renee Gaspard, Yan Yu, Ivana Yang & John Quackenbush
Department of Biochemistry, The George Washington University School of Medicine, Washington, DC, USA
John Quackenbush
Department of Biostatistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD, USA
John Quackenbush

Authors

Vasily Sharov
View author publications
You can also search for this author in PubMed Google Scholar
Ka Yin Kwong
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Frank
View author publications
You can also search for this author in PubMed Google Scholar
Emily Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Hasseman
View author publications
You can also search for this author in PubMed Google Scholar
Renee Gaspard
View author publications
You can also search for this author in PubMed Google Scholar
Yan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ivana Yang
View author publications
You can also search for this author in PubMed Google Scholar
John Quackenbush
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Quackenbush.

Additional information

Authors' contributions

VS and JQ are responsible for drafting the manuscript and producing the final version. KYK, BF, EC, JH, RG, YY, and IY contributed data and participated in its analysis. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharov, V., Kwong, K.Y., Frank, B. et al. The limits of log-ratios. BMC Biotechnol 4, 3 (2004). https://doi.org/10.1186/1472-6750-4-3

Download citation

Received: 15 November 2003
Accepted: 08 March 2004
Published: 08 March 2004
DOI: https://doi.org/10.1186/1472-6750-4-3

The limits of log-ratios