A modeling study by response surface methodology and artificial neural network on culture parameters optimization for thermostable lipase production from a newly isolated thermophilic Geobacillus sp. strain ARM

Background Thermostable bacterial lipases occupy a place of prominence among biocatalysts owing to their novel, multifold applications and resistance to high temperature and other operational conditions. The capability of lipases to catalyze a variety of novel reactions in both aqueous and nonaqueous media presents a fascinating field for research, creating interest to isolate novel lipase producers and optimize lipase production. The most important stages in a biological process are modeling and optimization to improve a system and increase the efficiency of the process without increasing the cost. Results Different production media were tested for lipase production by a newly isolated thermophilic Geobacillus sp. strain ARM (DSM 21496 = NCIMB 41583). The maximum production was obtained in the presence of peptone and yeast extract as organic nitrogen sources, olive oil as carbon source and lipase production inducer, sodium and calcium as metal ions, and gum arabic as emulsifier and lipase production inducer. The best models for optimization of culture parameters were achieved by multilayer full feedforward incremental back propagation network and modified response surface model using backward elimination, where the optimum condition was: growth temperature (52.3°C), medium volume (50 ml), inoculum size (1%), agitation rate (static condition), incubation period (24 h) and initial pH (5.8). The experimental lipase activity was 0.47 Uml-1 at optimum condition (4.7-fold increase), which compared well to the maximum predicted values by ANN (0.47 Uml-1) and RSM (0.476 Uml-1), whereas R2 and AAD were determined as 0.989 and 0.059% for ANN, and 0.95 and 0.078% for RSM respectively. Conclusion Lipase production is the result of a synergistic combination of effective parameters interactions. These parameters are in equilibrium and the change of one parameter can be compensated by changes of other parameters to give the same results. Though both RSM and ANN models provided good quality predictions in this study, yet the ANN showed a clear superiority over RSM for both data fitting and estimation capabilities. On the other hand, ANN has the disadvantage of requiring large amounts of training data in comparison with RSM. This problem was solved by using statistical experimental design, to reduce the number of experiments.


Background
Today, lipases (EC 3.1.1.3, triacylglycerol acylhydrolases) stand amongst the most important biocatalysts. They carry out novel reactions in both aqueous and nonaqueous media. Lipases are used to hydrolyze ester bonds of a variety of nonpolar substrates at high activity, chemo-, region-and stereo-selectivity. Moreover, they are used to catalyze the reverse reactions (such as esterification [1] and transesterification [2]) in nonpolar solvents [3] and [4].
Among lipases of different sources, microbial thermostable lipases are highly advantageous for biotechnological applications, since they can be produced at low cost and exhibit improved stability [3]. Thus, various thermostable lipase-producing microorganisms have been isolated from diverse habitats [5][6][7].
Bacterial lipases are mostly extracellular and their production greatly influenced by nutritional and physico-chemical factors, such as nitrogen and carbon sources, metal ions, initial pH, temperature, medium volume, agitation rate, incubation period, inoculum size and aeration [8] and [9].
The most important stages in a biological process are modeling and optimization to improve a system and increase the efficiency of the process without increasing the cost [10]. The classical optimization method (single variable optimization) is not only time-consuming and tedious but also does not depict the complete effects of the parameters in the process and ignores the combined interactions between physicochemical parameters. This method can also lead to misinterpretation of results [10] and [11]. In contrast, response surface methodology (RSM) is an empirical modeling system for developing, improving, and optimizing of complex processes [12] and [5]. RSM assesses the relationships between the response(s) and the independent variables [13], and defines the effect of the independent variables, alone or in combination, in the processes.
Although RSM has so many advantages, and has successfully been applied to study and optimize the enzymatic processes [14] and [15], and enzyme production from microorganisms [16] and [17], it is hard to say that it is applicable to all optimization and modeling studies [18][19][20]. The past decade has seen a host of data analysis tools based on biological phenomena develop into well-established modeling techniques, such as artificial intelligence and evolutionary computing. Artificial neural networks (ANNs) are now the most popular artificial learning tool in biotechnology, with a wide applications range included optimization of bioprocesses [21] and enzyme production from microorganisms [22].
Indeed an ANN is a massively interconnected network structure consisting of many simple processing elements capable of performing parallel computation for data processing. The fundamental processing element of ANNs (the artificial neuron) simulates the basic functions of biological neurons [18] and [23].
In this work, after finding the best composition of production medium among the best previously published and modified media, the optimization of physical factors for extracellular thermostable lipase production from a newly isolated Geobacillus sp. strain ARM (DSM 21496 = NCIMB 41583) was carried out using RSM and ANN.

Effect of various production media on lipase production
The production of lipases is mostly inducer-dependent [24] and different media have different stimulation effects on lipase production [9] based on the physiological and biochemical pathways of the bacterium. In order to select the best lipase production medium, the ability of bacterium to produce lipase was tested in eight different liquid media ( Figure 1). Lipase activity in medium A1 was significantly higher than other production media, which is composed of peptone and yeast extract as organic nitrogen sources, olive oil as carbon source and lipase production inducer, sodium and calcium as metal ions, and gum arabic as emulsifier and lipase production inducer.
Generally, microorganisms provide high yields of lipase when organic nitrogen sources are used, such as peptone and yeast extract, which have been used for lipase production by various thermophilic Bacillus sp. [25,26] and [27]. Yeast extract is one of the most important nitrogen sources for high level lipase production by different microorganisms [28]. Besides this role, yeast extract supplies vitamins and trace elements for the growth of bacteria and increases their lipase production [29].
High levels of lipase production were reported from various thermophilic Bacillus sp. in the presence of olive oil as carbon source in the culture medium [6,27,30] and [28]. Most published experimental data have shown that lipid carbon sources (especially natural oils) stimulate lipase production [9,31] and [32], whereas carbon sources that are easily broken down and used by bacteria play an inhibitory role [30,33] and [34]. Different microorganisms have different requirement for metal ions. Calcium ions play essential roles for many microbial species. They are important in maintaining cell wall rigidity, stabilizing oligomeric proteins and covalently bounding protein peptidoglycan complexes in the outer membrane [35]. Lipase production by various Bacillus sp. was stimulated in the presence of Ca 2+ alone [26] and [36] or in combination with other ions such as Mg 2+ , and Fe 2+ [37].
On the other hand, highly branched, helically configurated, non-metabolizable polysaccharides such as gum arabic are able to enhance the lipase production. This might probably be due to the emulsification of culture media containing oil to increase the lipid surface (interfacial area between oil and water) for lipase action, detachment of lipase from the oil surface, and from binding sites at the outer membrane of Gram-negative bacteria [30,38] and [39].
As a result, A1 production medium was chosen as the medium to be used in the further optimization of lipase production.

Analyzing and modeling
The central composite rotary design (CCRD) along with the observed responses is shown in Table 1.

Response surface methodology
Fitting the data to various models (linear, two factorial, quadratic and cubic) and their subsequent ANOVA showed that all models were unable to explain the effects of physical factors on the lipase production. To overcome of this problem, we used backward elimination strategy followed by hierarchical terms addition to find the best model. Backward elimination started with all of the predictors in the model. The variable that was least significant (with the largest P-value) was removed and the model was refitted. Each subsequent step removed the least significant variable in the model until all remaining variables had individual P-values smaller than 0.05 [40]. Finally, modified cubic equation (equation 1) and its subsequent ANOVA (Table 2) showed a quite suitable model to optimize the lipase production. Indeed, the modified model was a quadratic model with one eliminated (V.Ag) and one additional (T.Ag.t) terms.
The computed model F-value of 1176.88 implies the model is significant and there is only a 0.01% chance that a "model F-value" this large could occur due to noise. The 'lack of fit F-value" of 0.18 implies the lack of fit is not significant relative to the pure error. There is a 69.32% chance that a "lack of fit F-value" this large could occur due to noise. Non-significant lack of fit shows the model is significant. On the other hand, the pure error is very low, indicating good reproducibility of the data obtained.
With very small "model P-value" (< 0.0001) and large "lack of fit P-value" (0.6932) from the analysis of ANOVA and a suitable coefficient of determination (R 2 = 0.9998) and adjusted coefficient of determination (R 2 adjusted = 0.999), the modified cubic polynomial model was highly significant and sufficient to represent the actual relationship between the response and the significant variables ( Table 2).

Artificial neural network
Effect of architecture and topology on neural network performance The selection of an optimal neural-network architecture and topology is of critical importance for a successful application. Several neural-network architectures and topologies were tested for the estimation and prediction of lipase production. Table 3 summarizes the top five ANN models.

Effect of learning algorithm and transfer function
Training a neural network model essentially means selecting one model from the set of allowed models that minimizes the cost criterion. We have tested different learning algorithms for training neural network models. All accepted models (RMSE < 0.0001, R = 1 and DC = 1) have shown that incremental back propagation (IBP) was the most suitable learning algorithm for prediction of lipase production ( Table 3).
The type of transfer function employed affects the neural network's learning rate and is instrumental in its performance. In the present work, among all employed transfer functions for hidden and output layers, accepted models were produced by linear function for output layer and Gaussian function or hyperbolic tangent (Tanh) for hid-Lipase activity in different compositions of production media Figure 1 Lipase activity in different compositions of production media. den layer that between them, the best models have been obtained by Gaussian function.

Optimal number of hidden neurons
Although it is important to select the optimal number of hidden neurons carefully, depending on the type and complexity of the task, this usually has to be done by trial and error. An increase in the number of hidden neurons up to a point usually results in a better learning performance. Too few hidden neurons limit the ability of the neural network to model the process, and too many may allow too much freedom for the weights to adjust and, thus, to result in learning the noise present in the database used in training [41]. We tested the effect of number of hidden neurons on the goodness of fit. The results of testing with the two sample experiments, evaluated statistically on the basis of the coefficient of determination (R 2 ), are shown in Figure 2. In both examined cases, the optimum number of hidden neurons was 16, with an obvious overfitting when too many hidden neurons were used. Then the 6-16-1 topology was chosen as the best topology for estimation of lipase production.

Artificial neural network analysis of lipase production
The best ANN chosen in the present work was a multilayer full feedforward incremental back propagation network with Gaussian transfer function ( Table 3, C21) that consisted of a 6-16-1 topology. The optimized values of network for learning rate and momentum were 0.15 and 0.8, respectively. The learning was completed in RMSE = 9.99E-5, R = 1 and DC = 1. In the case of training data set, the coefficient of determination (R 2 ) and absolute average deviation (AAD) were 1 and 0.1%, respectively, whereas for the testing data set, R 2 was 1 and AAD was 0.231% (Table 4) and for validating data sets R 2 and AAD were 0.989 and 0.059%, respectively ( Table 5). Comparison of predicted and experimental values in training, testing and validating data sets, not only revealed capability of ANN in prediction of known data responses (the data that have been used for training) but also showed the ability of gen-eralization for unknown data (the data that have not been used for training) and implying that empirical models derived from ANN can be used to adequately describe the relationship between the input factors and lipase production.

Comparison of RSM and ANN predicted values
The predicted output values of RSM and ANN are shown in Table 4. Though both models preformed well and offered stable responses, yet the ANN based approach was better in both data fitting and estimation capabilities in comparison to the RSM.

Main effects and interaction between parameters
The optimum level of each variable and the effect of their interactions on lipase production as a function of two variables were studied by plotting three dimensional response surface curves (while keeping the other variables at optimum point).
ANOVA analysis (Table 2) and three dimensional plots ( Figure 3) reveal that growth temperature, medium volume, inoculum size and incubation period had significant effects on lipase production. ANOVA analysis shows that although pH was not a significant parameter (P value > 0.05), it had important and significant interactions with other parameters, hence it has been used to develop the model. On the other hand, among the different interactions, interaction between agitation rate and growth volume, did not show significant effect on lipase production (P value > 0.05). Figure 3B depicts that lipase activity effectively increased with a decrease in growth volume but agitation rate did not show significant effect on lipase production. On the other hand, ANOVA analysis and Figure  3D reveal, that agitation is one of the most important parameters for lipase production. As a conclusion, though both agitation rate and growth volume parameters are significant, yet their interaction is not a significant parameter for lipase production. Hence modification of model via    Figure 3A represents the three dimensional plot as function of temperature and inoculum size on lipase activity. Maximum lipase activity of 0.47 Uml -1 was obtained at the 52.3°C and 1.0% inoculum size. Further increase or decrease in the temperature, and increase in the inoculum size led to the decrease in the enzyme production. Generally, the optimum temperature for lipase production corresponds with the growth temperature of the respective microorganism [26]. It has been also proven that temperature regulate enzyme synthesis at mRNA transcription and probably translation levels. For extracellular enzymes, temperature influences their secretion, possibly by changing the physical properties of the cell membrane [42]. On the other hand, though higher temperature causes higher reaction rates and higher solubility of substrate and products, yet oxygen solubility is usually decreased.
At a suitable inoculum size, the nutrient and oxygen levels are enough for sufficient growth of bacteria and therefore, enhance the lipase production. If the inoculum size is too small, insufficient number of bacteria will lead to reduced amount of secreted lipase. High inoculum size can result in the lack of oxygen and nutrient depletion in the culture media [42] and [43].
Figures 3B and 3C depict the medium volume-agitation rate and initial pH-agitation rate interactions respectively. These plots reveal that the lipase activity increased with a decrease in culture volume and agitation rate. The maximum lipase activity was obtained at 50 ml culture volume and moderately acidic pH (5.8), under static condition. Similarly, static condition had resulted in comparatively high lipase production for Syncephalastrum racemosum [44]Pseudomonas sp. strain S5 [45] and Pseudomonas aeruginosa [46]. Generally, suitable agitation lead to sufficient supply of dissolved oxygen in the media [47]. Nutrient uptake by bacteria also will be increased [48], but the   degree of aeration appears to be critical in some cases since shallow layer (static) cultures (moderate aeration) produced much more lipase than shake cultures (high aeration) [45].
The medium volume may have a great effect on the enzyme production. Although a larger medium volume initially contains more oxygen, nutrients and space for growth of bacteria, the void in the container and subsequently oxygenation of the medium will be decreased. On the other hand, it seems that ratio of surface area to volume (A/V) is important for lipase production where higher ratio cause higher oxygenation and lipase production [49].
The combined effect of initial pH and incubation time on lipase production is shown in Figure 3D. According to the plot, a moderately acidic initial pH (5.8) caused maximum lipase production after 24 h of cultivation. The activity was decreased remarkably as the incubation period changed. pH plays an important role in all the biological processes. The initial pH of the growth medium is important for lipase production [8]. Most bacteria prefer neutral initial pH for the best growth and lipase production, such as thermophilic Bacillus sp. strains L2 and 398 [50] and [51]. Maximum lipase activity at higher initial pH by various thermophilic Bacillus sp. has also been reported [25] and [32]. In contrast, Ertugrul et al. [52] have reported a moderately acidic pH (6.0) as the optimum initial pH for lipase production by Bacillus sp. The molecular electric charges and consequently molecular interactions and functions are directly related to media pH, thus any changes in medium pH affects many biolog-ical functions such as enzymatic processes, signaling pathways and transportations of various components across the cytoplasmic membrane and cell wall [53]. Therefore, medium pH is very important in nutrients absorption and growth of bacteria, stimulation of enzyme production via signaling pathways and release of extracellular enzymes (based on proteolytic mechanism of signal peptidases that has been explained by Paetzel et al. [54]).
Lipases are produced throughout bacterial growth, with peak production being obtained by late exponential growth phase [55]. Therefore, the optimum incubation time is based on duration of log phase that is influenced by environmental conditions as well as by characteristics of the organism itself.
Different optimum conditions for maximum lipase production by various thermophilic Bacillus sp. were reported [25,32,50] and [51]. Strain differences and synergistic effects with other factors present in the medium might be responsible for differences in the obtained results. Although no conclusive picture has been emerged so far from the large amount of experimental data concerning the physiology of lipase biosynthesis and its regulation, most of published experimental data seem to support the following inference. At the end of log phase, when one of the essential nutrients of the culture medium is used up or some waste product of organism builds up in the medium to an inhibitory level, microorganisms try to solve the problem and continue the growth. One response to this problem is the production of extracellular hydrolytic enzymes such as lipases, proteases and amylases. In other words, limitation of growth can be an inducer for the production of some enzymes. On the other hand, Table 6 shows that lipase production is the result of a synergistic combination of effective parameters interactions. These parameters are in equilibrium. It means that a change of one parameter can be compensated by changes of other parameters to give same results.
Finally, Figure 4 shows the importance percentage of effective parameters on the lipase production. Inoculum size of 18.15% is the most important factor on the lipase production, incubation period of 17.01%, agitation rate of 16.78%, growth temperature and medium volume of 16.46% and 16.44% respectively, and pH of 15.19% are subsequent degrees of importance.

Optimization of reaction
The optimal conditions for lipase production were predicted as presented in Table 5 along with their predicted and actual values. Among the various optimum conditions, the highest lipase activity (0.47 Uml -1 ; 4.7-fold increase) was obtained at following conditions, growth temperature (52.3°C), medium volume (50 ml), inocu- Figure 2 Optimal number of hidden neurons. Estimation of lipase production with neural networks of varying number of hidden neurons, tested with two example cases: incremental back propagation multilayer full feedforward (blue diamond) and multilayer normal feedforward incremental back propagation (pink square) with Gaussian transfer functions.

Conclusion
In this work, different production media were tested for lipase production by a newly isolated thermophilic Geobacillus sp. strain ARM (DSM 21496 = NCIMB 41583). The maximum production was obtained in presence of peptone and yeast extract as organic nitrogen sources, olive oil as carbon source and lipase production inducer, sodium and calcium as metal ions, and gum arabic as emulsifier and lipase production inducer. On the other hand, culture parameters optimization and estimation of lipase production using RSM and ANN methods were successfully carried out. The best models were achieved by multilayer full feedforward incremental back propagation network and modified response surface model using backward elimi-nation, where the optimum condition was: growth temperature (52.3°C), medium volume (50 ml), inoculum size (1%), agitation rate (static condition), incubation period (24 h) and initial pH (5.8). The experimental lipase activity was 0.47 Uml -1 at optimum condition (4.7fold increase), which compared well to maximum predicted values by ANN (0.47 Uml -1 ) and RSM (0.476 Uml -1 ), whereas R 2 and AAD were determined as 0.989 and 0.059% for ANN, and 0.95 and 0.078% for RSM respectively. Though the modified response surface model was comparable to ANN to provide good quality predictions for the six independent variables in terms of the lipase production, yet the ANN showed a clear superiority over RSM as a modeling technique for data sets showing nonlinear relationships.
On the other hand, ANN has the disadvantage of requiring large amounts of training data in comparison with RSM [56]. This problem was solved by using statistical experimental design, to reduce the number of experi-    [22] and modeling the growth of a bacterium (25 different combinations) [60]. Bas and Boyaci [18] employed face-centered design (FCD) and modified face-centered design (MFCD) for ANN study (13 different combinations for training).
As a conclusion lipase production is the result of a synergistic combination of effective parameters interactions. These parameters are in equilibrium and the change of one parameter can be compensated by changes of other parameters to give the same results. In addition, ANN can be a very powerful and flexible tool for modeling of the optimization process.

Bacterial strain
The bacterial strain used in this study was isolated from contaminated soil with oil from Selangor, Malaysia and identified as Geobacillus sp. strain ARM via 16S rDNA analysis [GenBank:EF025325] and deposited in DSMZ, Germany (DSM 21496) and NCIMB, UK (NCIMB 41583). This strain was preserved in sterile 16% (v/v) glycerol in Tryptic Soy Broth (TSB) at -80°C.

Composition of lipase production medium
In order to select the best lipase production medium, eight different media were tested. The composition of the media was (% w/v): M1: peptone (3) The media were sterilized for 15 min at 121°C after pH adjustment to 7.0. Bacterial inoculum (2% v/v; Ab 600 = 0.5 of overnight culture in TSB) was then inoculated into 50 ml production medium and incubated by agitation under 150 rpm, for 48 h at 60°C. The cell free supernatant was obtained by centrifugation at 12,000 g, 4°C for 15 min prior to lipase assay.

Lipase activity assay
Determination of liberated free fatty acid was measured by colorimetric assay [58] using olive oil as substrate. The enzymatic reaction was performed in a water bath shaker for 30 min at 50°C under 200 rpm agitation. One unit of lipase activity was defined as 1.0 μmol of free fatty acid liberated min -1 and reported as Uml -1 .

Experimental design
A five-level-six-factor central composite rotary design (CCRD) was employed in this study, requiring 33 experiments [59]. The variables and their levels selected for the lipase production optimization were: growth temperature (45 -65°C); medium volume (50 -200 ml); inoculum size (1 -5%); agitation rate (0 -200 rpm); incubation period (24 -72 h) and initial pH (5 -9). The experimental data [40 points include CCRD design (Table 1) and optimization data (Table 5)] was divided into three sets: training set, testing set and validating set. All tests were performed in triplicate. Importance of effective parameters on lipase production Figure 4 Importance of effective parameters on lipase production.

Response surface methodology analysis
The CCRD design experimental data was used for model fitting in RSM to find the best polynomial equation. This data was analyzed using Design Expert version 6.06 (Stat Ease Inc. Minneapolis, USA) and then interpreted. Three main analytical steps: analysis of variance (ANOVA), a regression analysis and the plotting of response surface were performed to establish an optimum condition for the lipase production. Then, the predicted values obtained from RSM model, were compared with actual values for testing the model. Finally, the experimental values of predicted optimal conditions (Table 5) were used as validating set and were compared with predicted values.

Artificial neural network analysis
A commercial ANN software, NeuralPower version 2.5 (CPC-X Software) was used throughout the study. Multilayer normal feedforward and multilayer full feedforward neural networks were used to predict the lipase activity. Networks were trained by different learning algorithms (incremental back propagation, IBP; batch back propagation, BBP; quickprob, QP; genetic algorithm, GA; and Levenberg-Marquardt algorithm, LM). The ANN architecture consisted of an input layer with six neurons, an output layer with one neuron, and a hidden layer. To determine the optimal network topology, only one hidden layer was used and the number of neurons in this layer and the transfer functions of hidden and output layers (sigmoid, hyperbolic tangent function, Gaussian, linear, threshold linear and bipolar linear) were iteratively determined by developing several networks. Each ANN was trained until the network root of mean square error (RMSE) was lower than 0.0001, average correlation coefficient (R) and average determination coefficient (DC) were equal to 1. Other ANN parameters were chosen as the default values of the software. In the beginning, weights were initialized with random values and adjusted through a training process in order to minimize network error.
The CCRD design experimental data was divided into training and testing sets. For training, 25 points were used (Tables 1 and 4). One strategy for finding the best model is to summarize the data, it is well established that in ANN modeling, the replicates at center point do not improve the prediction capability of the network because of the similar inputs [10]. Hence, we improved our model by using mean of center points instead of 5 center points (Tables 1 and 4, italic numbers). To test the network, 4 remaining points were used (Tables 1 and 4, bold numbers). On the other hand, experimental values of predicted optimal conditions (Table 5) were used as validating set.

Verification of estimated data
To test the estimation capabilities of the techniques, the estimated responses obtained from RSM and ANNs were compared with the observed responses. The coefficient of determination (R 2 ) and absolute average deviation (AAD) were determined and these values were used together to compare ANNs to each other for finding the best ANN model, and the best ANN model with RSM. The AAD and R 2 are calculated by equations 2 and 3, respectively.
where y i,exp and y i,cal are the experimental and calculated responses, respectively, and p is the number of the experimental run.
where n is the number of experimental data. R 2 is a measure of the amount of the reduction in the variability of response obtained by using the repressor variables in the model. Because R 2 alone is not a measure of the model's accuracy, it is necessary to use absolute average deviation (AAD) analysis, which is a direct method for describing the deviations. Evaluation of R 2 and AAD values together would be better to check the accuracy of the model. R 2 must be close to 1.0 and the AAD between the predicted and observed data must be as small as possible.
The acceptable values of R 2 and AAD values mean that the model equation defines the true behavior of the system and it can be used for interpolation in the experimental domain [10].
The financial support by Universiti Putra Malaysia is gratefully acknowledged. (3)