Measuring log-ratios on microarrays
Microarray experiments generally measure relative expression levels between biological samples. However, there is a fundamental limit to the changes that can be measured on an array and understanding that that these limits exist is important for analyzing microarray experiments. This observation depends fundamentally on the manner in which most microarray scanners work. Following hybridization of spectrally distinguishable labeled targets to the arrayed probes on a microarray, the surface of the slide is generally interrogated using one or more lasers, each tuned to excite a particular fluorescent label. The fluorescent light emitted from the surface is collected through an optical system, generally spectrally separated, and focused on a photon detector, usually a photomultiplier tube (PMT). PMTs have a glass photocathode window coated by one or more alkali metals that has a high probability of converting an incoming photon to an electron. The electron emitted from the window is attracted to an alkali metal coated electrode which is maintained at a positive charge. When the initial electron strikes the electrode, it normally releases a number of additional electrons. These are attracted to a series of coated electrodes, each maintained at a slightly higher voltage than the previous, in effect multiplying the number of electrons released at each subsequent electrode. After a series of these amplification steps, the electrons are collected by a final electrode and the output current is measured. This output current depends on the intensity of the light (i.e. the number of photons) and the total voltage maintained across the PMT – a higher voltage accelerates electrons more in each step, producing a greater final current. It should be noted that this process is also stochastic, so that each photon produces a number of electrons which can be modeled as a Gaussian distribution with mean μ and standard deviation σ. It should be noted that as the light intensity increases, the number of photons increases and this has an effect on the distribution, with N photons producing approximately Nμ final electrons with a standard deviation of
. This explains, in part, the reason why the variation in signal intensity, and consequently derived measurements such as log-ratios are more uncertain for genes expressed at lower levels. Finally, the signal from the PMT is converted to a digital signal using an analog-to-digital converter (ADC). Typical array scanners use 16-bit ADCs, giving the instruments an output range of 0 to 65535 (216-1) relative fluorescence units (RFUs) for each pixel. The reported intensity values for each spot on the array varies between research groups and software used for image processing. Common measures of expression include background subtracted mean or median pixel values measures for each arrayed gene. For the purposes of the analysis presented here, we will use the background-subtracted mean pixel values reported by the TIGR Spotfinder image analysis software [1].
Microarray assays are often used to compare expression levels between paired samples and for a variety of reasons, these comparisons are typically expressed for each gene as the logarithm base 2 of the ratio of the (background subtracted) fluorescent signals measured from each labeled sample [log2(R/G)]; we refer to these as log-ratios. Because the fluorescent dyes used in most microarray assays have slightly different efficiencies for light emission, the detection efficiencies of the phototubes has some wavelength dependence and hence differ for the different dyes, and because the PMTs exhibit nonlinearities at high and low intensities, the log-ratios measured often exhibit some systematic, intensity-dependent variation. This systematic error is most easily visualized using a Ratio-Intensity (RI) plot ([2, 3]; also called an MA plot by Speed and colleagues) in which the log-ratio for each spot is plotted as a function of one-half the logarithm of the product of the measured intensity
, which is equivalent to the logarithm of the geometric mean of the intensity for that gene, a measure of the relative expression level of a particular gene. The shape of the distribution one observes in an R-I plot depends in a fundamental way on the experimental design one chooses as that defines the comparisons that are made. For closely related samples where one expects gene expression to be highly similar, the distribution of log-ratio values is broad at lower intensities, reflecting the greater relative uncertainty as one approaches the detection limits in one or both channels, while it narrows at higher expression levels (Figure 1A,1B); for biologically diverse samples the R-I plot can present a very different profile (Figure 1C,1D,1E,1F).
The R-I plot can also reveal some of the limitations of using log-ratios as a measure of expression. As described previously, the 16-bit ADCs in microarray scanners limit the maximum intensity that can be measured in both red and green channels on an array such that both log2(R) and log2(G) values range independently between a minimum of 0 and a maximum of 16. One can visualize this as a square box in the a plot of log2(R) versus log2(G), or as a diamond-shaped area in an R-I plot (Figure 2A). This relationship is due to the fact that the R-I plot is essentially a 45° (π/4) rotation (and slight rescaling) of the log-intensity plot, where the square represents the limits defined by each of the two independent fluorescence measurements (Figure 2B).
Most microarray image analysis software performs a background subtraction and uses other methods to avoid saturation of pixels, the reported fluorescence signals normally do not reach the absolute limit of detection. The background-subtracted data we use for analysis exhibit that effect in hybridization assays where the fluorescence signal is particularly strong (Figure 1B,1D,1F). Similar effects can be seen as the signal intensity decreases toward the lower limit, where discrete integer values assigned to gene expressed at low levels appear as diagonal "whiskers" in the R-I plot (Figure 3); this often arises as a result of setting expression values below some threshold to a minimal value, a process referred to as "flooring."
It is important to note that this effect limits the dynamic range of "fold-change" (equivalent to the log-ratio) measurements on arrays, particularly as the measured intensities approach either the minimum or maximum detectable levels accessible on a particular array scanner. Furthermore, it is important to note that these limits are not unique to dual-color detection techniques. Comparisons made using single color microarrays are also limited by the dynamic range of the individual measurements and fold-change estimates in comparisons demonstrate exactly the same type of artifact.