The method described here is based on cocktails of forward and reverse oligonucleotides for PCR which are designed to identify and index the genetic diversity known to occur at a target region in any species of interest. The use of a primer cocktail expands considerably the choice of the polymorphic target sequence because the target does not need to be flanked by fixed regions, which is the required condition to design conventional primers. This allows choosing target sequences which are sufficiently short to make the test efficient for transformed products in which DNA is degraded. The fixed 5' regions of the primers for PCR is sufficiently long to allow sequencing of small fragments (< 100 bp) without the need of cloning. Using 'read start' sequences as indexing markers can be a useful for fast alignment and identification of the target of different species in high throughput applications (Figure 1).
The method relies on the existing information in databases of genomic diversity. With the increasing use of massively parallel sequencing and falling costs the publicly available sequence information is growing rapidly. The information on the SNPs occurring in the variable portion of the primer cocktail can be easily upgraded to include additional variants of the same species and of new ones. Moreover, informatics tools become increasingly available and facilitate the automation and indexing of DNA sequence analyses deposited in online or custom databases without the need of sophisticated informatics know how [11]. Remarkably, in the case of the COI gene it was computationally predicted that 'universal' fragments of 100 bp and 250 bp should have respectively 90% and 95% probabilities to discriminate most animal species [3].
In the case application illustrated in this paper a 226 bp region of CytB was chosen on the basis of online sequence information (Table 1, Table 2, Additional file 1, Additional file 2, Additional file 3) with the objective of designing an assay to identify tuna and scomber species widely used for seafood by the identification of mitochondrial haplotypes. This fragment size proved to be short enough for efficient PCR amplification of degraded DNA samples obtained from canned tuna and scomber samples. Depending on the online information on the position of putative diagnostic SNPs in the 226 bp AB fragment, either fragment B or A can allow discriminating most member of this group of Scombridae (Additional file 4), with few exceptions for the most closely Thunnus species (see below). When only the B fragment was efficiently amplified and sequenced, the polymorphism of this small fragment (95 bp) was sufficient to detect a gross mislabelling of species content in the mostly degraded samples such as tuna salad and tuna sauce (Table 3).
The method constitutes a significant improvement of several previously described tests for the control and traceability of tuna-containing food [12–15] which in general are designed for longer target sequences, and/or can only discriminate small groups of taxonomically closely related species (e.g. Thunnus spp.). DNA extractions, PCR reactions and sequencing are performed by standard protocols, and are thus suited for large scale analyses which will contribute to enrich the database of Scombrid species with new markers of intra- and inter- specific variability.
Some technical and biological caveats in our methods need to be considered. From the technical point of view, to choose the best target fragment on the basis of described polymorphisms and to carry out the quality control of the online sequences are time consuming tasks. In our case, the CytB sequences which passed the quality control were found to be poorly represented in some of the target Scombridae species, thus decreasing the power to define diagnostic SNPs (Additional file 4). For example, a single diagnostic SNP could differentiate T. thynnus vs. T. maccoyi and T. orientalis vs. T. alalunga, but the online sequences representing T. maccoyi and T. orientalis were only four and three, respectively. From the biological point of view, some species are often too closely related to be discriminated using either short mitochondrial or nuclear markers, and in addition the introgression of mitochondrial genomes between close fish species is a well known phenomenon, e.g. among Scomber colias, S. japonicus and S. australasicus [16] and among species of Thunnus (e.g. between T. thynnus and T. orientalis; [17]). It has been recently shown that longer mtDNA target regions or the combination of mitochondrial and nuclear markers allow unambiguous discrimination between the eight species of the Thunnus complex [7, 18]. However, most tests for validating foods need to be, at least for large scale product screenings, economically sustainable. Therefore a two step strategy may be considered, i.e. to use this rapid screening protocol to identify potential mislabelling and frauds followed by an additional focussed test of only the mislabelled samples. The latter might include the analysis of nuclear regions to exclude mitochondrial introgression of the closely related species, or cloning and sequencing of the mitochondrial fragments to discriminate multiple species contents and infer their relative amounts in the most problematic samples.