Cell Culture
The 293FT cell line was maintained in DMEM supplemented with 100 mL/L fetal calf serum, 2 mmol/mL L-glutamine, 100 μg/mL penicillin and 100 units/mL streptomycin. The cells were incubated in a humidified incubator at 37°C containing 50 mL/L CO2. Cell viability was estimated by the trypan blue dye exclusion method. The 293FT cells were seeded into 24-well plates 24 h prior to transfection. Three wells of cells were transfected for every experiment. The cells were transfected using LipofectAMINE 2000 (abbreviated as LipofectAMINE) cationic liposome (Invitrogen, Carlsbad, California, USA) and the cells were harvested 36 h after transfection. Transfection efficiency was evaluated by calculating the ratio of cells that express green fluorescent protein (GFP) by using flow cytometer (COULTER EPICS XL, Beckman, USA). The experiments were performed in duplicate.
Uniform Design
On the basis of orthogonal design, UD as a new experimental design method was proposed by Fang et al in 1980s. The characteristics of UD are taking no account of regular comparability, completely ensuring the uniformity, and distributing the test points in the experimental scope adequately and uniformly. UD finds good representative points uniformly scattered over the sample space for a much more efficient parameter search. It is one kind of space filling designs that can be used for computer techniques. Suppose there are s samples of interest over a domain CS. The goal here is to choose a set of m points p
m
= {θ1, ..., θ
m
} ⊂ CSsuch that these points are uniformly scattered on CS.
Experimentations
In a protocol consists of 15 experiments, amount of liposome, plasmid, and the number of seeded cells were set as independent variables while transfection efficiency was set as dependent variable. Each independent variable had 15 levels. The ranges of independent variables were set according to the instruction of manufacturer. The protocol was performed according to the principle of UD (Table 1). Each transfection efficiency (dependent variable) was calculated by flow cytometer. The expression of GFP in each experiment was also observed by fluorescence microscope (Eclipse 80I, Nikon, Tokyo, Japan). A model was constructed by using LS-SVM. The respective fitted value to each measured transfection efficiency was also deduced from the established model. Another protocol consisting of 10 experiments was designed centering on the predicted optimal conditions at which the dependent variable would reach the maximum (Table 2). And the observed GFP expression was shown in Figure 1 and Figure 2. All the observed data in Table 1 and Table 2 were the mean values of three independent experiments.
Development of the LS-SVM based models for prediction of transfection efficiency
In regression formulation, the goal is to estimate an unknown continuous-valued function based on a finite number set of noisy samples (x
i
, y
i
), (i = 1, ..., n), where d-dimensional input is x ∈ Rdand the output is y ∈ R. In SVM regression formulations, the input X is first mapped into a m-dimensional feature space using some fixed (nonlinear) mapping, and then a linear model is constructed in this feature space [8]. Using mathematical notation, the linear model (in the feature space) f(x, ω) is given by Equation (1), where g
j
(x), j = 1, ..., m denotes a set of nonlinear transformations, and b is the "bias" term.
The quality of estimation is measured by the loss function L(y, f(x, ω)). SVM regression uses a new type of loss function called ε-insensitive loss function proposed by Vapnik [6]:
SVM regression tries to reduce model complexity by minimizing ||ω||2. In addition, it introduces (non-negative) slack variables ξ
i
,
i = 1, ... n to measure the deviation of training samples outside the ε-insensitive zone. Thus, SVM regression is formulated as minimization of the following function:
Compared with simple SVM, LS-SVM computes the solution by solving a linear system instead of quadratic programming. This is due to the use of equality instead of inequality constraints in the above problem formulation. It is well known that LS-SVM generalization performance (estimation accuracy) depends on a good setting of meta-parameters parameters C and the kernel parameters. The main performance metric of LS-SVM is the prediction risk (Equation (4)), defined as mean square error (MSE), between estimated values derived from LS-SVM and true values for testing inputs.
Therefore, for ensuring good generalization performance, the main issue on LS-SVM application depends on the proper setting of these parameters for a given data set. Selecting a particular kernel type and kernel function parameters is usually based on application-domain knowledge and should also reflect distribution of inputted values of the training data [20]. Here, we showed example of SVM regression using radial basis function (RBF) kernels (Equation (5)), where the RBF width parameter γ should reflect the distribution/range of x-values of the training data.
The LS-SVM based model building processes were carried out by using the software package named LS-SVMlab1.5 available at http://www.esat.kuleuven.ac.be/sista/lssvmlab/. This toolbox provides in-depth functionality of SVM. Functions include tuning, optimizing, validating, and training SVMs. Significantly, it provides a good visual representation of the trained LS-SVMlab. The meta-parameters of the LS-SVM model with a Gaussian kernel function are γ (the width of the Gaussian kernels) and C (regularization factor). The model construction process consisted consecutively of: i) selection of the inputted variables, ii) selection of the training parameters (C and γ), iii) construction of the model, iv) performance evaluation by validation data.
Contribution analysis of independent variables
The contribution of specific variable was evaluated by using a method based on LS-SVM. Random alteration of the value of a specific variable has a similar effect to take out of this variable. However, the feature space keeps unchanged when it undergoes this process. There are only three variables, so that we can analyze them one by one. The analysis process was described as follows: i) select a variable, such as the density of cells; ii) random exchange the 15 training samples value of cell density and keep other two variables (the amount of plasmid and the amount of LipofectAMINE) unchanged; iii) train another LS-SVM using the same parameters (γ = 42, C = 1500); iv) test the trained LS-SVM on 10 testing samples, get the predicated value and MSE; v) repeat step ii to step iv for 10 times, get the average predicated value and average MSE; vi) repeat step i to step v on other two variables. The variable with the biggest MSE has the most important effect on the depend variable.