The case for well-conducted experiments to validate statistical protocols for 2D gels: different pre-processing = different lists of significant proteins

Table 5 The effect of log transformation using non-normalized data.

No Log Transform No Normalization Missing replaced with zero (54% of identified spots picked up)	No Log Transform – Normalization 1 Missing replaced with zero (54% of identified spots picked up)	No Log Transform Normalization 2, Missing replaced with zero (54% of identified spots picked up)	No Log Transform Normalized-PDQUEST (64% of identified spots picked up)
			SSP 0312¹
			SSP 1112¹
			SSP 1309
			SSP 1321¹
			SSP1331¹
SSP 1509	SSP 1509	SSP 1509
SSP 1733	SSP 1733
			SSP 2307
		SSP 2309	SSP 2309
			SSP 3234¹
			SSP 3437¹
			SSP 3523¹
SSP 4225	SSP 4225	SSP 4225	SSP 4225
SSP 4435	SSP 4435	SSP 4435
SSP 4438	SSP 4438		SSP 4438²
			SSP 4517¹
SSP 4519	SSP 4519	SSP 4519	SSP 4519²
			SSP 4637 ²
SSP 4724	SSP 4724		SSP 4724
			SSP 4735¹
			SSP 5011¹
		SSP 5309
		SSP 5329
SSP 5413	SSP 5413	SSP 5413	SSP 5413
		SSP 6205
		SSP 6304
SSP 6314		SSP 6314	SSP 6314
		SSP 6321
		SSP 6349
		SSP 6443
SSP 6452	SSP 6452	SSP 6452	SSP 6452
		SSP 7027
			SSP 7231
		SSP 7223
		SSP 7334
			SSP 7413¹
		SSP 7750
		SSP 8613

¹ These are spots that were present in a very small number of gels, and therefore did not meet our criteria to be included.
² These spots have highly skewed distributions or were very poor quality spots. Log transformation made out the distribution closer to normal and p-values were no longer significant.
Column 1 has spots that have significantly different intensities (p = 0.05) normalizing and log transforming data. Column 2 has spots that are significantly different in intensity after using normalization 1, but before using a log transformation. Column 3 has spots that are significantly different in intensity after using normalization 2, but before using a log transformation. Column 4 has the results from the image analysis software PDQUEST, which has an option for normalizing but no log transformation. Columns 1 and 2 are subsets of the 201 spots in the final data set that met our criteria for inclusion. Column 3 is a subset of all possible spots in the experiment. Spots in bold were later identified by MALDI-TOF. These were all spots that were biologically relevant to the system being studied. The percentages in parenthesis in the header measure how many of the ten proteins known to be different were identified after the different normalization techniques.

ISSN: 1472-6750