Skip to main content
  • Research article
  • Open access
  • Published:

NanoUPLC-MSE proteomic data assessment of soybean seeds using the Uniprot database



Recombinant DNA technology has been extensively employed to generate a variety of products from genetically modified organisms (GMOs) over the last decade, and the development of technologies capable of analyzing these products is crucial to understanding gene expression patterns. Liquid chromatography coupled with mass spectrometry is a powerful tool for analyzing protein contents and possible expression modifications in GMOs. Specifically, the NanoUPLC-MSE technique provides rapid protein analyses of complex mixtures with supported steps for high sample throughput, identification and quantization using low sample quantities with outstanding repeatability. Here, we present an assessment of the peptide and protein identification and quantification of soybean seed EMBRAPA BR16 cultivar contents using NanoUPLC-MSE and provide a comparison to the theoretical tryptic digestion of soybean sequences from Uniprot database.


The NanoUPLC-MSE peptide analysis resulted in 3,400 identified peptides, 58% of which were identified to have no miscleavages. The experiment revealed that 13% of the peptides underwent in-source fragmentation, and 82% of the peptides were identified with a mass measurement accuracy of less than 5 ppm. More than 75% of the identified proteins have at least 10 matched peptides, 88% of the identified proteins have greater than 30% of coverage, and 87% of the identified proteins occur in all four replicates. 78% of the identified proteins correspond to all glycinin and beta-conglycinin chains.

The theoretical Uniprot peptide database has 723,749 entries, and 548,336 peptides have molecular weights of greater than 500 Da. Seed proteins represent 0.86% of the protein database entries. At the peptide level, trypsin-digested seed proteins represent only 0.3% of the theoretical Uniprot peptide database. A total of 22% of all database peptides have a pI value of less than 5, and 25% of them have a pI value between 5 and 8. Based on the detection range of typical NanoUPLC-MSE experiments, i.e., 500 to 5000 Da, 64 proteins will not be identified.


NanoUPLC-MSE experiments provide good protein coverage within a peptide error of 5 ppm and a wide MW detection range from 500 to 5000 Da. A second digestion enzyme should be used depending on the tissue or proteins to be analyzed. In the case of seed tissue, trypsin protein digestion results offer good databank coverage. The Uniprot database has many duplicate entries that may result in false protein homolog associations when using NanoUPLC-MSE analysis. The proteomic profile of the EMBRAPA BR-16 seed lacks certain described proteins relative to the profiles of transgenic soybeans reported in other works.


Soybean Glycine max (L) Merrill] is one of the most important leguminous crops in the world with a vital importance to the economies of many countries. Brazil is responsible for 27% of the world soybean production and is second only to the U.S., which produces 35% [1]. Soybean seed products are used in a variety of industrial goods derived from oil (58%) and protein (68%) and are used to feed both humans and animals [1].

In the last decade, efforts have been undertaken to improve soybean crop yields. To this end, genetic engineering has been extensively used to develop soybean plants with abiotic and biotic resistance or tolerance [2]. However, both the quantity of grain produced and the nutritional content of the grain are critical; therefore, the production of highly nutritional seeds of many important crops is currently a focus of research [35]. Furthermore, the soybean is also a viable platform for the production of recombinant pharmaceutical molecules, such as human growth hormone [6] and coagulation factor IX [7], for several reasons: the soybean can undergo long-term storage at ambient temperatures [8, 9], can provide an appropriate biochemical environment for protein stability through the creation of specialized storage compartments [9, 10], is not contaminated by human or animal pathogens [8, 11], its desiccation characteristics prevent it from undergoing non-enzymatic hydrolysis or protease degradation [11], and it does not carry harmful substances that are present in certain plant leaves, which is important for downstream processing [11, 12].

To enhance protein content analysis efforts, the use of technologies that permit the analysis of protein expression patterns has become a necessity in evaluating the genetic modification of these plants [5, 1315]. The seed, leaf and root proteins of a variety of cultivars have been well documented [15, 16]. Two-dimensional gel electrophoresis (2DE) is the most commonly used technique in proteomic analysis, and many types of proteomic studies based on 2DE have been reported [17]; however, 2DE is an extremely time-consuming technique. High throughput protein identification via 2DE requires the use of replicate gels as well as gel excision and digestion procedures [18]; these steps can be complicated and slow. Database comparisons are typically performed using peptide mass fingerprinting [19, 20], and quantization is performed by gel image intensity evaluation or by protein tagging [21, 22]. All of these stages of 2DE are time-consuming and can produce inconsistent results.

The coupling of liquid chromatography with mass spectrometry, as in NanoUPLC-MSE procedures, provides more robust throughput sample analysis capabilities than other techniques. Complex samples may be prepared in single vials, and all processes associated with chromatography, MS and MS/MS acquisition and database searching can be performed in a few steps [23]. These experiments have led to significant innovations, such as the ability to obtain linear sequence structural information at the femtomole level [24], small surface areas and minimal dead volumes, which minimize analyte losses due to surface adsorption, as well as low flow rates, which minimize the required analyte dilution [25]. Low-abundance analytes can be separated with a high recovery rate when they are associated with a high dynamic range and a high-quality MS detection system [26]. In this present study, we used MSE, which is a data-independent acquisition method that uses low and high collision energies without precursor selection, unlike other methods such as data-dependent acquisition (DDA) [27]. Ion detection, clustering and the normalization of data-independent, alternate scanning LC-MSE data have been explained in detail elsewhere [27, 28].

Here, we present a statistical assessment of soybean seeds using NanoUPLC-MSE proteomic experiments and provide a comparison with the theoretical tryptic digestion of sequences from the Uniprot[29, 30] soybean database.

Results and discussion


The resulting soybean seed NanoUPLC-MSE peptide data generated by the PLGS process are shown in Figure 1A. The experiment resulted in 3,400 identified peptides; 58% of these peptides were obtained from peptide match type data in the first pass, and 6% were obtained in the second pass [31]. A total of 17% of the peptides were identified by a missed trypsin cleavage, whereas an in-source fragmentation rate of 13% was expected for the Synapt G2 data. Figure 2A shows the peptide parts per million error (ppm) indicating that 82% of the peptides were detected with an error of less than 5 ppm. As shown in Figure 2B, 75% of the identified proteins have at least 10 matched peptides, and 88% of the identified proteins have greater than 30% coverage (Figure 2C). The experiment revealed 113 proteins, of which 87% were replicated 4 times, as shown in Figure 1B and Table 1. These results far exceed the minimum protein identification quality compared to other proteomic data, such as those obtained from the 2DE technique, in which only 10 to 20% of the identified proteins exhibit a coverage greater than 30% [14, 20].

Figure 1
figure 1

Peptide detection type, repetition rate, and protein function chart . A) On peptide match type, PepFrag1 and Pepfrag2 correspond to the peptide matches when compared to database by PLGS, VarMod corresponds to variable modifications, InSource corresponds to fragmentation that occurred on ionization source, MissedCleavage indicates the missed cleavage performed by trypsin and Neutral loss H2O and NH3 correspond to water and ammonia precursor losses; B) Repeat rate indicates the number of times that an identified protein apears on the replicas; C) Protein function of the identified proteins clustered in storage, defense, energy processing, embryogenesis, seed maturation or other functions.

Figure 2
figure 2

Experiment PPM error at the peptide level, number of identified peptides per protein and experimental protein sequence coverage . A) Indicates the number of identified peptides in a 5ppm error range; B) Indicates the number of identified peptides per protein; C) indicates the number of proteins with sequence coverage from 10 to 90%.

Table 1 Protein identification table

Figure 3 shows the results obtained from dynamic range detection, indicating that 95 proteins were quantified. A 3-log range and a good detection distribution of high and low molecular weights were obtained, as indicated by the size of the squares. The 10 least abundant proteins includes B3TDK5_SOYBN-Lipoxygenase, C7EA91_SOYBN-Mutant glycinin subunit A1aB1b, Q588Z3_SOYBN-Beta amylase, KTI1_SOYBN-Kunitz type trypsin inhibitor, LOX1_SOYBN-Seed lipoxygenase 1, Q4LER6_SOYBN-Beta conglycinin alpha prime, C7EA92_SOYBN-Mutant glycinin subunit A1aB1b, C6T7Y4_SOYBN-Putative uncharacterized protein, Q5K3Q9_SOYBN-Putative dehydrin Fragment and ITRB_SOYBN-Trypsin inhibitor B. The 10 most abundant proteins are composed of GLYG1_SOYBN-Glycinin G1, Q549Z4_SOYBN-Proglycinin A2B1, Q4LER5_SOYBN-Beta conglycinin alpha subunit, Q9ATY1_SOYBN-Kunitz trypsin inhibitor, Q3V5S6_SOYBN-Beta conglycinin alpha subunit, GLYG2_SOYBN-Glycinin G2, C6SWW4_SOYBN-Putative uncharacterized protein, Q39898_SOYBN-Kunitz trypsin inhibitor, GLYG5_SOYBN-Glycinin and LEC_SOYBN-Lectin. A detailed description of each protein is presented in Table 1. A comparison of our results with other proteomic data reveals that there is a discrepancy in the number of proteins that were identified. Barbosa et al. [14] described 192 identified proteins, although these 192 2DE spots likely correspond to a lower number of proteins because many of the identifications are associated with the same protein with a pI shift. The same trend can be observed in the work presented by Mooney et al. [20], which described 96 identifications of 150 spots detected via 2DE. Sakata et al. [32] described more than 500 spots in gels from cotyledons but reported only 34 identified proteins. Our results mainly identify single proteins. However, there were some exceptions, especially for proteins that possess subunits with similar amino acid sequences, such as glycinin and beta-conglycinin, but are identified with a different accession number in the Uniprot database.

Figure 3
figure 3

Detection dynamic range of the experiment . White square represent rabbit phosphorylase B as an external standard. Dark squares indicate proteins identified in the experiment. The size of each square represents the molecular weight of the identified protein.

We also compared the identified proteins and correlated them to their function. All identified proteins having at least two replicates are shown in Table 1. A total of 42% of these data correspond to storage proteins, as shown in Figure 1C. From TSP, 78% of the identified species correspond to glycinin and beta-conglycinin chains: GLYG1_SOYBN-Glycinin and Q549Z4_SOYBN-Proglycinin A2B1 correspond to 18% and 12% of TSP, respectively [33]; 13% correspond to inhibitors, including Q9ATY1_SOYBN kunitz type (2.35% of TSP) and Q9SBA9_SOYBN Bowman Birk proteinase inhibitor (0.06% of TSP)[34]; 16% correspond to energy-related proteins, such as C0J370_SOYBN Ribolose (0.33% TSP), Q42795_SOYBN beta-amylases (0.57%) and B3TDK4_SOYBN lipoxygenase (1.14% TSP); and 17% are associated with abiotic stress, including Q7XAW0_SOYBN dehydrin (1% TSP) and LEA proteins (<0.01 % TSP) [35]. An additional 13% correspond to putative uncharacterized proteins. Late embryogenesis proteins and maturation proteins represent 7% of the proteins, including Q39871_SOYBN (0.34% TSP), P93165_SOYBN Em protein (0.2% TSP) and Q9LLQ6_SOYBN seed maturation protein (0.13% TSP) [36, 37]. These results are in agreement with those of other studies [15, 17, 20, 32].

Uniprot data assessment

There are 13,117 soybean sequence entries in the Uniprot database. The theoretical tryptic digestion results show 368,435 peptides. Assuming one missed cleavage, the theoretical peptide database has 723,749 entries, 548,336 of which possess a molecular weight greater than 500 Da. These results and a comparison with proteomic data are presented in Figure 4. Seed proteins represent 0.86% of the protein database entries. At the peptide level, trypsin-digested seed proteins represent only 0.3% of the theoretical peptide database, including missed cleavage proteins, which are responsible for only 0.08% of the identified data. Of the seed proteins detected in our experiments, 78% have a pI value between 4.2 and 6. This result is presented in Figure 5, which shows that seed proteins have acidic characteristics. This characteristic was also reported by Robic et al. [38]. At the peptide level (Figure 6), 22% of all database peptides have a pI of less than 5, and 25% of them have a pI between 5 and 8. Figure 6 also shows that 43% of peptides resulting from the experiments have a pI value of less than 5. This pattern is characteristic of tryptic digestion and LC-ESI-MS experiments because the method favors charged peptides.

Figure 4
figure 4

Distribution of peptides and proteins in the Uniprot database and seed proteomics by NanoUPLC-MSE. A) Corresponds to the database proteins; B) Indicates the proteins identified in this work.

Figure 5
figure 5

Distribution of seed proteins identified within the Uniprot protein database by the isoelectric point . Data shown in dark squares correspond to database proteins, and data shown in white squares are hits identified by this work.

Figure 6
figure 6

Distribution of peptide isoelectric points for the database and identified peptides . Dark squares corresponds to database peptides. White squares represents peptides identified using NanoUPLC-MSE.

In Table 2, we present the number of proteins that are not detected within a particular peptide molecular mass detection range. When assuming the minimum and maximum peptide detection levels found using NanoUPLC-MSE experiments, i.e., 500 to 5000 Da, 64 proteins do not have detectable peptides after trypsin digestion (Table 2 and 3). The majority of these proteins correspond to putative and uncharacterized proteins, although NU6C_SOYBN NAD(P)H-quinone oxidoreductase is within the detection range and is not related to seed proteins. Assuming 1 peptide at a threshold of 5,000 Da (Table 3), a few seed proteins are not detected by NanoUPLC-MSE: ACT6_SOYBN Actin-6, ACT7_SOYBN Actin-7, ALL50_SOYBN Major Gly 50 kDa allergen and Q7M212_SOYBN Water-soluble 35K protein. With 2 peptides at an upper threshold of 5,000 Da (Table 3), several putative proteins are not detected, including Q3HM31_SOYBN Hydrophobic seed protein and Q692Y3_SOYBN Glycinin gy1 (Fragment). With 3 peptides at an upper threshold of 5,000 Da (Table 3), the not detected protein list is mainly composed of putative and uncharacterized proteins and other protein fragments that have short amino acid sequences.

Table 2 Number of proteins with 0 peptides over a given peptide detection range
Table 3 Uniprot access codes for proteins containing 0, 1, 2 or 3 peptides for a given database peptide detection threshold

Many of these undetected proteins have been found in soybean seeds and described in other studies [15, 17, 20, 32]. An analysis of the undetected proteins in the database shows that the majority of the sequences are composed of short amino acid sequences with at most 20 residues. This observation may explain the level of missed detection in the NanoUPLC experiments. Other proteins that are not described in this work, such as glyceraldehyde 3-phosphate (Q2I0H4_SOYBN), Malate dehydrogenase (B0M1B0_SOYBN), Glutathione S-transferase (C6ZQJ7_SOYBN), Isoflavone reductase (Q9SDZ0_SOYBN), Alcohol dehydrogenase 1 (Q8LJR2_SOYBN) and In2-1 protein (Q9FQ95_SOYBN), have been described as soybean seed proteins. These proteins have been described in a previous study on the proteomics of transgenic soybean seeds expressing CTAG recombinant proteins [23]. Further experiments must be performed to clarify this issue.

We hypothesize that environmental stress may have altered the seed expression profiles because the EMBRAPA BR-16 seeds were cultivated in the field, and the transgenic seeds were grown in a greenhouse. For example, Barbosa et al. [14] and Brandão et al. [22] reported different expression levels of enzymes in the transgenic soybean proteome of Monsanto Roundup-ready seeds. The authors state that the genetic modification itself could be a stress factor and may produce alterations in the seed proteome. A comparison between the results of this work and our previous study provides evidence in support of this hypothesis, indicating the need for further experiments to confirm possible proteome alterations due to genetic modification. Nevertheless, highly hydrophobic or insoluble proteins will not be detected due to the necessity for in-solution protease digestion; special protocols are needed for the digestion of these types of protein.


NanoUPLC-MSE experiments are a viable choice as a proteomic pipeline for soybean protein detection. NanoUPLC-MSE provides good protein coverage with a 5 ppm peptide error, reduced sample manipulation relative to other techniques and detection of a wide range of peptide MWs, i.e., from 500 to 5000 Da. Because not all proteins from the Uniprot database are covered, the use of a second digestion enzyme is recommended depending on the tissue to be analyzed. In the case of seed tissue, trypsin protein digestion results in good database coverage. The Uniprot database has many duplicate entries that may result in false protein homolog association and must be formatted prior to use or the use of the reviewed sequences only. It also has many fragment entries that are not suitable for NanoUPLC-MSE analysis but may be used in other techniques. The proteomic profile of EMBRAPA BR-16 seed lacks certain described proteins relative to transgenic soybean profiles reported in other studies. This discrepancy demonstrates the need for further transgenic and nontransgenic proteome analyses.


Extraction of total soluble protein from soybean seeds

Seeds from the EMBRAPA BR-16 cultivar were used in this work. The soybean seeds were ground to a fine powder using a coffee grinder. A 100 mg sample of powder was weighed and placed in a 2 mL capped centrifuge tube. Petroleum ether (1 mL) was added, and the sample was gently agitated for 15 min. The supernatant was discarded, and this step was repeated twice. The petroleum ether was evaporated for 10 min, and 1 mL of 20 mM Tris–HCl pH 8.3, 1.5 mM KCl, 10 mM DTT, 1 mM PMSF and 0.1% V/V SDS was added. The sample was slowly vortexed at room temperature for 10 min and centrifuged for 5 min at 10000g at 4°C. The supernatant was then transferred to a new centrifuge tube. For each 200 μL of sample, 800 μL of cold acetone was added to the centrifuge tube. The sample was vortexed thoroughly and incubated at −20°C for 1 h with vortexing performed every 15 min. The sample was then centrifuged for 10min at 15700g. The supernatant was discarded, and the pellet was dried at room temperature for 30min. The pellet was carefully dissolved in 500 μL of 50 mM ammonium bicarbonate and quantified using a Quant-iT™ Protein Assay Kit (Invitrogen, USA). The sample was finally diluted with 50 mM ammonium bicarbonate to a protein concentration of 1 μg.μL-1.

Sample preparation for NanoUPLC-MSEacquisition

A 50 μL aliquot of the 1 μg.μL-1 sample was added to 10 μL of 50 mM ammonium bicarbonate in a microcentrifuge tube. Then, 25 μL of RapiGEST™ (Waters, USA) (0.2% v/v) was added, and the sample was vortexed and incubated in a dry bath at 80°C for 15 min. The sample was briefly centrifuged, and 2.5 μL of 100 mM DTT was added. The sample was vortexed gently and incubated at 60°C for 30 min followed by centrifugation. Iodoacetamide (2.5 μL of a 300 mM solution) was added, and the sample was briefly vortexed and incubated in the dark at room temperature for 30 min. Then, 10 μL of trypsin (with 400 μL of 50 mM ammonium bicarbonate added per 20 μg vial of trypsin) was added, and the sample was briefly vortexed. The sample was digested at 37°C in a dry bath overnight. To cleave and precipitate the RapiGEST™, 10 μL of a 5% TFA solution was added, and the sample was vortexed, incubated for 90 min at 37°C in a dry bath, and centrifuged at 18000g at 6°C for 30 min. The supernatant was transferred to a Waters Total Recovery vial (Waters, USA), and 5 μL of Rabbit Phosphorylase B (Waters, part number 186002326) (with 1 mL of 3% acetonitrile and 0.1% formic acid) and 85 μL of a 3% acetonitrile and 0.1% formic acid solution were added. The final concentration of the protein was 250 ng.μL-1, and the final concentration of Phosphorylase B was 25 fmol.μL-1. The final volume was 200 μL.


The nanoscale LC separation of tryptic peptides from TSP was performed using a nanoACQUITY™ system (Waters Corp., USA) equipped with a Symmetry C18 5μm, 5mm x 300μm precolumn and a nanoEase™ BEH130 C18 1.7 μm, 100 μm x 100 mm analytical reversed-phase column (Waters, USA). The samples were initially transferred to the pre-column using an aqueous 0.1% formic acid solution with a flow rate of 5 μL.min-1 for 2 min. Mobile phase A consisted of 0.1% formic acid in water, and mobile phase B consisted of 0.1% formic acid in acetonitrile. The peptides were separated using a gradient of 3-40% mobile phase B for 200 min with a flow rate of 600 ηL.min-1 followed by a 10 min rinse with 85% of mobile phase B. The column was re-equilibrated to the initial conditions for 20 min. The column temperature was maintained at 35°C. The lock mass was delivered from the fluidics system of a SynaptG2 pump using a constant flow rate of 400 ηL.min-1 at a concentration of 200 fmol of GFP to the reference sprayer of the NanoLockSpray source of the mass spectrometer. All samples were analyzed in four replicates.

The tryptic peptides were analyzed using a Synapt G2 HDMS™ mass spectrometer (Waters, Manchester, UK) with a hybrid quadrupole/ion mobility/orthogonal acceleration time-of-flight (oa-TOF) geometry. For all measurements, the mass spectrometer was operated in the sensitive mode of analysis with a typical resolving power of at least 10000 full-width half-maximum (FWHM). All analyses were performed using a positive nanoelectrospray ion mode (nanoESI +). The time-of-flight analyzer of the mass spectrometer was externally calibrated with GFP b+ and y+ ions from 50 to 1990 m/z with the data post acquisition lock mass corrected using the GFP double charged precursor ion [M + 2H]2+ = 785.8426. The reference sprayer was sampled at a frequency of 30 s. The exact mass retention time (EMRT)[28] nanoLC-MSE data were collected in an alternating low energy and elevated energy acquisition mode. The continuum spectra acquisition time in each mode was 1.5 s with a 0.1 s interscan delay. In the low-energy MS mode, data were collected at constant collision energy of 3 eV. In the elevated-energy MS mode, the collision energy was increased from 12 to 45 eV during each 1.5 s spectrum. The radiofrequency that was applied to the quadrupole mass analyzer was adjusted such that the ions from 50 to 2000 m/z were efficiently transmitted, which ensured that any ions less than 50 m/z observed in the LC-MS data were only derived from dissociations in the TRAP T-wave collision cell.

Data processing and protein identification

The MS data that were obtained from the LC-MSE analysis were processed and searched using the ProteinLynx Global Server (PLGS) version 2.5 (Waters, Manchester, UK). Proteins were identified using the software’s embedded ion accounting algorithm and a search of the Glycine max database with MassPREP digestion standards (MPDS) UniProtKB/Swiss-Prot sequences (Phosphorylase - P00489 - PHS2_RABIT, Bovine Hemoglobin - P02070 - HBB_BOVIN, ADH - P00330 - ADH1_YEAST, BSA - P02769 - ALBU_BOVIN) that were appended to the database. Identifications and quantitative data packaging were performed using dedicated algorithms [28, 31] and a search against a soybean Uniprot database. The ion detection, clustering, and log-scale parametric normalizations were performed in PLGS with an ExpressionE license installed. The intensity measurements were typically adjusted for these components, i.e., the deisotoped and charge state-reduced EMRTs that were replicated throughout the entire experiment for the analysis at the EMRT cluster level. The fixed modification of carbamidomethyl-C was specified, and the included variable modifications were acetylation of the N-terminus, deamidation of N, deamidation of Q and oxidation of M. Components were typically clustered with a 10ppm mass precision and a 0.25 min time tolerance against the database-generated theoretical peptide ion masses with a minimum of one matched peptide. The alignment of elevated-energy ions with low-energy precursor peptide ions was performed with an approximate precision of 0.05 min. One missed cleavage site was allowed. The precursor and fragment ion tolerances were determined automatically. The protein identification criteria also included the detection of at least three fragment ions per peptide, 6 fragments per protein and the determination of at least one peptide per protein; the identification of the protein was allowed with a maximum 4% false positive discovery rate in at least four technical replicate injections. Using protein identification replication as a filter, the false positive rate was minimized because false positive protein identifications, i.e., chemical noise, have a random nature and do not tend to replicate across injections. For the analysis of the protein identification and quantification level, the observed intensity measurements were normalized to the intensity measurement of the identified peptides of the digested internal standard. Protein tables generated by PLGS were merged, and the dynamic range of the experiment was calculated using the in-house software program MassPivot by setting the minimum repeat rate for each protein in all replicates to 2.

Uniprot soybean database digestion and experiment analysis

Glycine max protein sequences were obtained from Uniprot (, and the theoretical tryptic digestion was performed using the in-house software Digestion tool. The digestion was performed allowing 1 missed cleavage, and the molecular mass and isoelectric point of all peptides and proteins were calculated. The peptide and protein tables from PLGS were compared with the database digestion table using the Spotfire software (, suitable graphics were generated for all data. Microsoft Excel (Microsoft, USA) was used for table manipulations.


  1. Soystats.,

  2. BASF and Embrapa’s Cultivance soybeans receive approval for commercial cultivation in Brazil.,

  3. Aluru MR, Rodermel SR, Reddy MB: Genetic modification of low phytic Acid 1–1 maize to enhance iron content and bioavailability. J Agric Food Chem. 2011, 59 (24): 12954-12962. 10.1021/jf203485a.

    Article  CAS  Google Scholar 

  4. Drakakaki G, Marcel S, Glahn RP, Lund EK, Pariagh S, Fischer R, Christou P, Stoger E: Endosperm-specific co-expression of recombinant soybean ferritin and Aspergillus phytase in maize results in significant increases in the levels of bioavailable iron. Plant Mol Biol. 2005, 59 (6): 869-880. 10.1007/s11103-005-1537-3.

    Article  CAS  Google Scholar 

  5. Herman EM, Helm RM, Jung R, Kinney AJ: Genetic modification removes an immunodominant allergen from soybean. Plant Physiol. 2003, 132 (1): 36-43. 10.1104/pp.103.021865.

    Article  CAS  Google Scholar 

  6. Cunha NB, Murad AM, Cipriano TM, Araujo ACG, Aragao FJL, Leite A, Vianna GR, McPhee TR, Souza GHMF, Waters MJ, et al: Expression of functional recombinant human growth hormone in transgenic soybean seeds. Transgenic Res. 2010, 20 (4): 811-826.

    Article  Google Scholar 

  7. Cunha NB, Murad AM, Ramos GL, Maranhao AQ, Brıgido MM, Araujo ACG, Lacorte C, Aragao FJL, Covas DT, Fontes AM, et al: Accumulation of functional recombinant human coagulation factor IX in transgenic soybean seeds. Transgenic Res. 2010, 20 (4): 841-855.

    Article  Google Scholar 

  8. Boothe J, Nykiforuk C, Shen Y, Zaplachinski S, Szarka S, Kuhlman P, Murray E, Morck D, Moloney MM: Seed-based expression systems for plant molecular farming. Plant Biotechnol J. 2010, 8: 588-606. 10.1111/j.1467-7652.2010.00511.x.

    Article  CAS  Google Scholar 

  9. Cunha NB, Araújo ACG, Leite A, Murad AM, Vianna GR, Rech EL: Correct targeting of proinsulin in protein storage vacuoles of transgenic soybean seeds. Genet Mol Res. 2010, 9 (2): 1163-1170.

    Article  CAS  Google Scholar 

  10. Jolliffe NA, Craddock CP, Frigerio L: Pathways for protein transport to seed storage vacuoles. Biochem Soc Trans. 2005, 33: 1016-1018. 10.1042/BST20051016.

    Article  CAS  Google Scholar 

  11. Ma JK-C, Drake PMW, Christou P: The production of recombinant pharmaceutical proteins in plants. Nat Rev Genet. 2003, 4: 794-805.

    Article  CAS  Google Scholar 

  12. Tremblay R, Wang D, Jevnikar AM, Ma S: Tobacco, a highly efficient green bioreactor for production of therapeutic proteins. Biotechnol Adv. 2010, 28: 214-221. 10.1016/j.biotechadv.2009.11.008.

    Article  CAS  Google Scholar 

  13. Kim Y-H, Choi SJ, Lee H-A, Moon TW: Quantitation of CP4 5-Enolpyruvylshikimate-3-Phosphate synthase in soybean by two-dimensional gel electrophoresis. J Microbiol Biotechnol. 2006, 16 (1): 25-31.

    Google Scholar 

  14. Barbosa HS, Arruda SC, Azevedo RA, Arruda MA: New insights on proteomics of transgenic soybean seeds: evaluation of differential expressions of enzymes and proteins. Anal Bioanal Chem. 2012, 402 (1): 299-314. 10.1007/s00216-011-5409-1.

    Article  CAS  Google Scholar 

  15. Natarajan SS, Xu C, Bae H, Caperna TJ, Garrett WM: Characterization of storage proteins in wild (Glycine soja) and cultivated (Glycine max) soybean seeds using proteomic analysis. J Agric Food Chem. 2006, 54 (8): 3114-3120. 10.1021/jf052954k.

    Article  CAS  Google Scholar 

  16. Aghaei K, Ehsanpour AA, Shah AH, Komatsu S: Proteome analysis of soybean hypocotyl and root under salt stress. Amino Acids. 2009, 36 (1): 91-98. 10.1007/s00726-008-0036-7.

    Article  CAS  Google Scholar 

  17. Komatsu S, Ahsan N: Soybean proteomics and its application to functional analysis. J Proteomics. 2009, 72 (3): 325-336. 10.1016/j.jprot.2008.10.001.

    Article  CAS  Google Scholar 

  18. Shevchenko A, Tomas H, Havlis J, Olsen JV, Mann M: In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc. 2006, 1 (6): 2856-2860.

    Article  CAS  Google Scholar 

  19. Yang Y, Zhang S, Howe K, Wilson DB, Moser F, Irwin D, Thannhauser TW: A comparison of nLC-ESI-MS/MS and nLC-MALDI-MS/MS for GeLC-based protein identification and iTRAQ-based shotgun quantitative proteomics. J Biomol Tech. 2007, 18: 226-237.

    Google Scholar 

  20. Mooney BP, Krishnan HB, Thelen JJ: High-throughput peptide mass fingerprinting of soybean seed proteins: automated workflow and utility of UniGene expressed sequence tag databases for protein identification. Phytochemistry. 2004, 65 (12): 1733-1744. 10.1016/j.phytochem.2004.04.011.

    Article  CAS  Google Scholar 

  21. Nogueira SB, Labate CA, Gozzo FC, Pilau EJ, Lajolo FM, Oliveira do Nascimento JR: Proteomic analysis of papaya fruit ripening using 2DE-DIGE. J Proteomics. 2012, 75 (4): 1428-1439. 10.1016/j.jprot.2011.11.015.

    Article  CAS  Google Scholar 

  22. Brandao AR, Barbosa HS, Arruda MA: Image analysis of two-dimensional gel electrophoresis for comparative proteomics of transgenic and non-transgenic soybean seeds. J Proteomics. 2010, 73 (8): 1433-1440. 10.1016/j.jprot.2010.01.009.

    Article  CAS  Google Scholar 

  23. Murad AM, Souza GH, Garcia JS, Rech EL: Detection and expression analysis of recombinant proteins in plant-derived complex mixtures using nanoUPLC-MS(E). J Sep Sci. 2011, 34 (19): 2618-2630. 10.1002/jssc.201100238.

    Article  CAS  Google Scholar 

  24. Shen Y, Zhao R, Berger SJ, Anderson GA, Rodriguez N, Smith RD: High-efficiency nanoscale liquid chromatography coupled on-line with mass spectrometry using nanoelectrospray ionization for proteomics. Anal Chem. 2002, 74: 4235-4249. 10.1021/ac0202280.

    Article  CAS  Google Scholar 

  25. Liu H, Finch JW, Lavallee MJ, Collamati RA, Benevides CC, Gebler JC: Effects of column length, particle size, gradient length and flow rate on peak capacity of nano-scale liquid chromatography for peptide separations. J Chromatogr A. 2007, 1147 (1): 30-36. 10.1016/j.chroma.2007.02.016.

    Article  CAS  Google Scholar 

  26. Levin Y, Wang L, Ingudomnukul E, Schwarz E, Baron-Cohen S, Palotás A, Bahn S: Real-time evaluation of experimental variation in large-scale LC–MS/MS-based quantitative proteomics of complex samples. J Chromatogr B. 2009, 877: 1299-1305. 10.1016/j.jchromb.2008.11.007.

    Article  CAS  Google Scholar 

  27. Geromanos SJ, Vissers JPC, Silva JC, Dorschel CA, Li G-Z, Gorenstein MV, Bateman RH, Langridge JI: The detection, correlation, and comparison of peptide precursor and product ions from data independent LC-MS with data dependant LC-MS/MS. Proteomics. 2009, 9 (6): 1683-1695. 10.1002/pmic.200800562.

    Article  CAS  Google Scholar 

  28. Silva JC, Gorenstein MV, Li G-Z, Vissers JPC, Geromanos SJ: Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol Cell Proteomics. 2005, 5 (1): 144-156. 10.1074/mcp.M500230-MCP200.

    Article  Google Scholar 

  29. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al: The universal protein resource (UniProt). Nucleic Acids Res. 2005, 33 (Database issue): D154-D159.

    Article  CAS  Google Scholar 

  30. Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R: UniProt archive. Bioinformatics. 2004, 20 (17): 3236-3237. 10.1093/bioinformatics/bth191.

    Article  CAS  Google Scholar 

  31. Li G-Z, Vissers JPC, Silva JC, Golick D, Gorenstein MV, Geromanos SJ: Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures. Proteomics. 2009, 9: 1696-1719. 10.1002/pmic.200800564.

    Article  CAS  Google Scholar 

  32. Sakata K, Ohyanagi H, Nobori H, Nakamura T, Hashiguchi A, Nanjo Y, Mikami Y, Yunokawa H, Komatsu S: Soybean proteome database: a data resource for plant differential omics. J Proteome Res. 2009, 8 (7): 3539-3548. 10.1021/pr900229k.

    Article  CAS  Google Scholar 

  33. Shutov AD, Kakhovskaya IA, Bastrygina AS, Bulmaga VP, Horstmann C, Muntz K: Limited proteolysis of beta-conglycinin and glycinin, the 7S and 11S storage globulins from soybean [Glycine max (L.) Merr.]. Structural and evolutionary implications. Eur J Biochem. 1996, 241 (1): 221-228. 10.1111/j.1432-1033.1996.0221t.x.

    Article  CAS  Google Scholar 

  34. Lee KJ, Kim JB, Ha BK, Kim SH, Kang SY, Lee BM, Kim DS: Proteomic characterization of Kunitz trypsin inhibitor variants, Tia and Tib, in soybean [Glycine max (L.) Merrill]. Amino Acids. 2011, 43 (1): 379-388.

    Article  Google Scholar 

  35. Wang W, Vinocur B, Altman A: Plant responses to drought, salinity and extreme temperatures: towards genetic engineering for stress tolerance. Planta. 2003, 218 (1): 1-14. 10.1007/s00425-003-1105-5.

    Article  CAS  Google Scholar 

  36. Hill JE, Breidenbach RW: Proteins of soybean seeds: II. Accumulation of the major protein components during seed development and maturation. Plant Physiol. 1974, 53 (5): 747-751. 10.1104/pp.53.5.747.

    Article  CAS  Google Scholar 

  37. Hsing YC, Tsou CH, Hsu TF, Chen ZY, Hsieh KL, Hsieh JS, Chow TY: Tissue- and stage-specific expression of a soybean (Glycine max L.) seed-maturation, biotinylated protein. Plant Mol Biol. 1998, 38 (3): 481-490. 10.1023/A:1006079926339.

    Article  CAS  Google Scholar 

  38. Robic G, Farinas CS, Rech EL, Miranda EA: Transgenic soybean seed as protein expression system: aqueous extraction of recombinant beta-glucuronidase. Appl Biochem Biotechnol. 2010, 160 (4): 1157-1167. 10.1007/s12010-009-8637-5.

    Article  CAS  Google Scholar 

Download references


This work was supported by the Brazilian Agricultural Research Corporation (EMBRAPA), the National Council for Scientific and Technological Development (CNPq) and the Fundação de Apoio a Pesquisa-DF (FAP-DF). The authors acknowledge support from C. Bloch at the Mass Spectrometry Laboratory-EMBRAPA.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Elibio L Rech.

Additional information

Authors’ contributions

All authors contributed equally to this work. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Murad, A.M., Rech, E.L. NanoUPLC-MSE proteomic data assessment of soybean seeds using the Uniprot database. BMC Biotechnol 12, 82 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: