BMC Biotechnology BioMed Central Methodology article Criteria for effective design, construction, and gene knockdown by

Background: RNA interference (RNAi) technology is a powerful methodology recently developed for the specific knockdown of targeted genes. RNAi is most commonly achieved either transiently by transfection of small interfering (si) RNA oligonucleotides, or stably using short hairpin (sh) RNA expressed from a DNA vector or virus. Much controversy has surrounded the development of rules for the design of effective siRNA oligonucleotides; and whether these rules apply to shRNA is not well characterized. Results: To determine whether published algorithms for siRNA oligonucleotide design apply to shRNA, we constructed 27 shRNAs from 11 human genes expressed stably using retroviral vectors. We demonstrate an efficient method for preparing wild-type and mutant control shRNA vectors simultaneously using oligonucleotide hybrids. We show that sequencing through shRNA vectors can be problematic due to the intrinsic secondary structure of the hairpin, and we determine a strategy for effective sequencing by using a combination of modified BigDye chemistries and DNA relaxing agents. The efficacy of knockdown for the 27 shRNA vectors was evaluated against six published algorithms for siRNA oligonucleotide design. Our results show that none of the scoring algorithms can explain a significant percentage of variance in shRNA knockdown efficacy as assessed by linear regression analysis or ROC curve analysis. Application of a modification based on the stability of the 6 central bases of each shRNA provides fair-to-good predictions of knockdown efficacy for three of the algorithms. Analysis of an independent set of data from 38 shRNAs pooled from previous publications confirms these findings. Conclusion: The use of mixed oligonucleotide pairs provides a time and cost efficient method of producing wild type and mutant control shRNA vectors. The addition to sequencing reactions of a combination of mixed dITP/ dGTP chemistries and DNA relaxing agents enables read through the intrinsic secondary structure of problematic shRNA vectors. Six published algorithms for siRNA oligonucleotide design that were tested in this study show little or no efficacy at predicting shRNA knockdown outcome. However, application of a modification based on the central shRNA stability should provide a useful improvement to the design of effective shRNA vectors. Published: 24 January 2006 BMC Biotechnology2006, 6:7 doi:10.1186/1472-6750-6-7 Received: 20 June 2005 Accepted: 24 January 2006 This article is available from: http://www.biomedcentral.com/1472-6750/6/7 © 2006Taxman et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Background
RNA interference (RNAi) is a naturally occurring phenomenon by which RNA duplexes known as short interfering RNA (siRNA) can reduce gene expression through enzymatic cleavage of a target mRNA mediated by the RNAinduced silencing complex (RISC). The ability of synthetic siRNA to inhibit targeted genes with near specificity makes it an extremely powerful tool for functional genomics that has drawn considerable interest recently [1,2]. RNAi is commonly achieved by introducing chemically synthesized siRNA 19-22 mers into cells by transfection. However, many cells and cell lines are either refractory to or adversely affected by transfection, and the transient nature of this methodology renders it unsuitable for the generation of long-term cell lines of the desirable phenotype. Two alternatives to synthetic siRNA are DNAvector mediated RNAi production [3][4][5], and most recently viral-mediated siRNA synthesis [6][7][8][9][10]. For the latter technologies, sense and antisense strands can be expressed from different promoters [11]. Alternatively, short hairpin (sh) RNAs, expressed from a single promoter, are processed into siRNAs by Dicer or a homologous double strand RNase [12].
One caveat of siRNA design is that not all 19-22 base RNA duplexes will cleave their target with efficacy, and much effort has gone towards identifying a set of rules for selecting an effective siRNA target site within a gene. Recent findings [13,14] offered the first clue towards the development of guidelines for selecting an siRNA target site. These studies showed that the RISC complex is asymmetric and favors the strand of the siRNA duplex with the least thermodynamically stable 5' terminus. Subsequently, Reynolds et al. designed an algorithm based on statistical data showing patterns of efficacy for siRNA oligonucleotides containing specific residues at defined positions within the 19-mer [15]. A limitation of their study is that a small number of genes were tested. Several additional algorithms for designing effective siRNAs have been published since those initial reports with surprisingly disparate results, making the determination of which residues are generally favorable for siRNA efficacy a point of controversy [16][17][18][19][20]. Additionally, whether any of the algorithms developed for synthetic siRNA oligonucleotides apply to the design of shRNA expressed stably from a vector has not been well explored.
In the present report, we construct and analyze a set of 27 shRNAs for 11 different human genes. To our knowledge this is the largest individual set of data published for shRNA 19-mers. We describe a method for simultaneously preparing wild type and control mutant shRNA vec- Design for producing wild-type and mutant shRNA vectors simultaneously Figure 1 Design for producing wild-type and mutant shRNA vectors simultaneously. A forward strand of the wild-type hairpin (blue) is synthesized together with a reverse strand containing a one bp mutation within both the sense and antisense copy of the target sequence (shown in red). The double stranded hybrid is ligated into the retroviral vector 5' of an H1 promoter and transformed into competent bacteria. Since replication is semi-conservative, the daughter bacteria will be of two different populations that carry either a double-stranded wild-type or a double-stranded mutant vector and can be isolated by preparing and sequencing individual colonies.  TG TG GT TA   AC AC CA AT   TA AC CA CA   AT TG GT GT   GCT TCA GAT G   CGA AGT CTA C  CAT CTG AAG C   GTA GAC TTC G   TTCAAGAGA   AAGTTCTCT   TT  TT  T  AA  AA  A   T  A   A  T   TG TG GT TA   AC AC CA AT   TA AC CA CA   AT TG GT GT   GCT TCA GAT G   CGA AGT CTA C  CAT CTG AAG C   GTA GAC TTC G   TTCAAGAGA   AAGTTCTCT   TT  TT  T  AA  AA  A   WILD TYPE PLASMID  MUTANT   Wild Type shRNA oligo pair tors that is time and cost efficient, and show that sequencing of shRNA plasmids can be quite problematic due to the intrinsic secondary structure of the hairpin. We examine several different strategies for overcoming this problem including the use of modified BigDye chemistries and the addition of agents known to relax DNA structure. The knockdown efficacy for each of the 27 shRNAs was evaluated against six published algorithms for siRNA oligonucleotide design by linear regression and ROC curve analyses. We describe a modification of three of the algorithms that provides fair-to-good prediction of shRNA efficacy, and confirm the significance of the modified algorithms using a pooled set of shRNAs from previous publications. These findings should be of general applicability in the design and construction of shRNA vectors.
Gene expression analysis for wild-type and mutant shRNA vectors prepared simultaneously using wild-type/mutant double stranded hybrids Figure 2 Gene expression analysis for wild-type and mutant shRNA vectors prepared simultaneously using wild-type/ mutant double stranded hybrids. (A) Sequences of the target sites for four wild-type and mutant shRNA vectors that were prepared simultaneously as detailed in Figure 1.

Design and preparation of shRNA plasmids
To address the question of how shRNA sequence correlates with knockdown efficacy, 27 shRNA vectors from 11 different genes were designed and constructed (Table 1). Target sequences were selected in the coding region of each gene and were designed to broadly conform to the seminal studies of sequence features for siRNA oligomer efficacy [13][14][15]. Accordingly, sequences are low in runs and have a G/C ratio of about 50%. The shRNAs were designed to target sites that are devoid of single nucleotide polymorphisms, and correspond to all splice variants amplified by our real time PCR primer sets.
Since siRNAs can have off-target effects, it is important for functional assays to make a specific mutant with one or more base mismatch within the target recognition site as a control [21]. To conserve time and cost, we have developed a method of making wild-type and mutant shRNA vectors simultaneously (detailed in Methods and Figure  1). Gene knockdown results for four wild-type/mutant shRNA pairs are shown in Figure 2. These results demonstrate the utility of this method in providing a point mutant shRNA vector that can serve as a loss-of-function control for gene knockdown by wild type shRNAs. Though detailed protocols have been published for con-struction of shRNA vectors [22], this is the first protocol for producing wild-type and mutant vectors simultaneously and should facilitate the implementation of highly controlled system for shRNA.

Strategy for accurate sequencing through hairpin structures
Verifying the sequence of an shRNA hairpin is essential since mismatch of even one nucleotide within the target sequence can ablate knockdown ( Figure 2 and [5,23].) An issue that is frequently encountered in the preparation of shRNA vectors is that many are difficult to sequence due to the intrinsic secondary structure of the hairpin. One strategy recently proposed to overcome this issue involves engineering a restriction site within the loop/stem region of the hairpin to physically separate the inverted repeats by digestion, and then piecing together sequence using sense and antisense primers [24]. However, the ability to achieve sequencing of shRNA constructs without modifying stem/loop sequence would be of clear advantage. To address this possibility, we evaluated modified sequencing reactions for improvement in the read-through of the hairpin secondary structure in three shRNA hairpins.
Modifications include adding agents known to relax DNA structure including DMSO, Betaine, PCRx Enhancer and ThermoFidelase I; and adding increasing amounts of Sequencing results for each of the three DNA constructs are summarized in Table 2. Read-through of the hairpin structure was measured as the ratio of the peak height about 300 bases after the hairpin structure to the signal about 50 bases before the hairpin structure. A ratio of 1 indicates no loss in signal and 0 indicates complete loss of read-though. In the absence of any additive to BD chemistry, the hairpin caused a reduction in peak height ratio for our less tightly structured hairpin, pHSPG-shmutTLR4, to 0.4, and a complete loss in read through for the other two plasmids. This can be visualized as an abrupt stop in the sequence peak profile for pHSPG-shTLR4 ( Figure 3A).
Among the DNA relaxing agents, 5% DMSO, 0.83 M Betaine and 1 × PCRx Enhancer each improved the sequence read significantly for some constructs. However, DNA sequencing of pHSPG-shTLR4 using modified reaction conditions the addition of 0.83 M Betaine plus 1 × PCRx Enhancer to BD chemistry was found to sequence most consistently, with peak height ratios of 0.5-0.9 (Table 2 and Figure 3B). The addition of 10:1 BD:dGTP chemistries alone also improved read through somewhat, with peak height ratios of 0.5-0.6 ( Table 2 and Figure 3C). The sub-optimal peak height ratio for 10:1 BD:dGTP can be attributed to a visible step in the sequence peak profile after the secondary structure region where the signal is reduced ( Figure 3C, arrow). Increasing the dGTP chemistry content to 5:1 and 3:1 BD:dGTP or using straight dGTP chemistry increased the peak height ratio and reduced the step somewhat (0.6 to 0.8 ratio). However, the mixed incorporation of dITP and dGTP resulted in worse peak broadening as the amount of dGTP used increased [see Additional file 1], and dGTP only chemistry caused severe sequence compressions (data not shown). The best overall results were observed by combining Betaine plus PCRx and 10:1 BD:dGTP mixed chemistries together. This combination reduced the step with less peak broadening and increased peak height ratios to 0.9-1.0 (Table 2 and Figure 3D). ThermoFidelase I, a DNA destabililizing enzyme that is frequently used to improve sequencing of genomic DNA [25,26], did not improve sequencing of any of the three hairpins in straight BD chemistry (data not shown), and actually reduced the peak height ratio significantly in 10:1 BD:dGTP chemistries for all three shRNA constructs, causing the reappearance of a stop at the hairpin structure (Table 2 and Figure 3E).
In summary, the combination of 10:1 BD:GTP chemistries, 0.83 M Betaine, and 1 × PCRx Enhancer provided optimal sequencing, and mixed BD:dGTP chemistries, Betaine, PCRx Enhancer, and DMSO each had some positive effects on their own. ThermoFidelase I, however, probably should be avoided for shRNA vectors with difficult intrinsic secondary structure.

Correlation between shRNA knockdown efficiency and published algorithms for siRNA design
To determine whether the efficacy of knockdown by shRNA vectors correlates with published rules for the design of effective siRNA oligonucleotides, shRNAs were evaluated for their ability to knockdown gene expression. The shRNAs were transduced stably into either THP1 or Jurkat human cell lines as detailed in Table 3, first two Columns. The average knockdown was determined from RNA collected on three or more different days and is listed for each shRNA (Column 3). Knockdown was shown to be reproducible for cell lines that were independently transduced and sorted, suggesting that knockdown is a function of the shRNA target sequence rather than features of the viral transduction [see Additional file 2]. More than one third of the shRNA vectors constructed were unable to suppress transcription (<10% in Column 3), despite comparable growth rates and long term expression of the GFP Correlation between shRNA knockdown efficacy and scoring for six published algorithms for siRNA Figure 4 Correlation between shRNA knockdown efficacy and scoring for six published algorithms for siRNA. Algorithm scores for each shRNA target site from Table 2  C marker at high levels in these cell lines. Furthermore, great variations in knockdown efficacy for several shRNAs made against many of the same genes (i.e., CLR16.2, CLR19.3 and TLR4) argue against any simple biological reasons for differences in efficacy for these genes. Many of the ineffective shRNAs have negative 5' ∆∆G values and high Reynolds scoring, each which have been hypothesized to correlate with siRNA knockdown efficacy ( Table  3, Columns 4 and 5) [13][14][15]. Conversely, among the shR-NAs that were able to confer gene knockdown, several had either positive 5'∆∆G values or low Reynolds scores. These findings indicate that 5'∆∆G and Reynolds scoring algorithm for siRNA may not provide positive correlative criteria for shRNA design.
To determine whether other published algorithms for siRNA oligonucleotide design can be applied to shRNA vectors, each of the shRNA target sites was evaluated by four additional algorithms, and scores were plotted against the percent knockdown for each shRNA (Table 3, Columns 6-9 and Fig. 4). For each algorithm plot a best fit line was drawn and the R 2 value calculated as an indication of whether the variance in knockdown efficacy can be explained by the algorithm scoring. Results confirm a poor association between shRNA efficacy and either 5' ∆∆G (free energy differential) considerations [13] or the Reynolds et al. algorithm [15], and also demonstrate a poor association with the Hsieh et al. algorithm [19], with each in fact showing a weak reverse correlation with the data. The algorithms of Amarguizoui et al. [20], Ui-Tei et al. [18], and Takasaki et al. [17], correlate directly with shRNA efficacy. However, none of the algorithm scores explain a significant percentage of the variance in knockdown efficacy. Among the algorithms tested, the Takasaki et al. scoring system shows the highest association, with an R 2 value of 0.0251.
Because these results suggest that a linear relationship does not strongly apply to shRNA knockdown for any of the six algorithms, we evaluated each of the algorithms by ROC curve analysis to determine whether any algorithm is superior to the others at identifying effective shRNAs. The ROC curve is a plot of sensitivity (the true positive fraction, TPF) versus 1 minus the specificity (the false positive fraction, FPF) that is generated by varying the decision threshold between the minimum and maximum algorithm score. The diagonal of the ROC plot represents the ROC curve for an algorithm that is no better at discrimination than random selection. Algorithms that are poor discriminators have ROC curves that track along the diagonal and have an area under the ROC curve (AUC) that is not significantly different from the AUC of the diagonal (0.5). Algorithms that are good discriminators have ROC curves with strong convex deviation from the diagonal and AUCs that approach 1 and are significantly different from the AUC of the diagonal.
The Hsieh et al. algorithm had a concave ROC curve (Fig.  5A) indicating unacceptable sensitivity and specificy in discriminating effective from ineffective shRNAs. The ROC curves for all other algorithms (Figs. 5B-F) tracked near the diagonal of the ROC plot and had AUCs that were not significantly different from the AUC of the diagonal (Figs 5B-F). Thus, none of the algorithms showed a statistically significant ability to discriminate between effective and ineffective shRNAs.
The Takasaki et al. algorithm (Fig. 5F) showed the most promise as a discriminator of effective from ineffective shRNAs. However, this algorithm suffered from a relatively high false positive fraction for decision thresholds near the maximum score as indicated by the weak, erratic deviation from the diagonal near the origin of the ROC curve (Fig. 5F). This indicated that the algorithm assigned a high score to a number of ineffective shRNAs. Inspection of the data revealed that two of the three high-scoring ineffective shRNAs targeted genes whose expression was successfully knocked-down by other shRNAs (Table 3, asterisks). Thus it is unlikely that the inefficacy of the shR-NAs is a consequence of selective pressure against the stable suppression of gene expression. It is more likely that the Takasaki et al. algorithm does not account for a critical feature of effective shRNAs.

Application of an algorithm modification based on the stability of the 6 central bases of each shRNA
Inspection of the physical properties of the high scoring ineffective shRNAs revealed that the average stability of the duplex formed by the 6 central bases of the shRNAs (bases 6-11 of the sense strand hybridized to bases 9-14 of the antisense strand) was greater than the average stability of high scoring effective shRNAs (∆G = -13.1 ± 0.1 versus -11.1 ± 1 kcal/mol respectively). Based on this observation, the Takasaki et al. algorithm was modified such that shRNAs with a central duplex ∆G equal to or less than -12.9 kcal/mol were assigned a minimum score (Table 4). This modification assigned minimum scores to five shRNAs, four which were ineffective, thus increasing the specificity of the algorithm without a significant loss in sensitivity. A minimum score assigned to one effective shRNA (71% knockdown), indicates that other properties in addition to central duplex stability influence efficacy. Nevertheless, the addition of this modification eliminated the weak erratic deviation of the ROC curve from the diagonal for high decision thresholds and increased the AUC to 0.79 (Fig. 5I) Because minimizing the false positive rate is the primary concern in shRNA design, we recommend using the modified Ui-Tei et al. algorithm, which had the lowest high false positive fraction at decision thresholds near the maximum score as indicated by the strong deviation from the diagonal near the origin of the ROC curve (Figs. 5H and 5K). Using a decision threshold of 3 limits selection of shRNAs to a region of the ROC curve where the sensitivity was acceptable (0.28-.33), while the specificity was very good (1.0). By setting this decision threshold, the false  HCV-5879 GGTGCTTGTGGATATTTTG 68 [33] positive fraction was minimized, while 28 -33% of the effective shRNAs were identified from our shRNAs and the published set of shRNAs respectively. Should the sensitivity need to be increased, we recommend using a decision threshold of 2. This threshold had a sensitivity of 0.54 -0.55 and a specificity of 0.88 -0.9. If the decision threshold was further relaxed to 0, the sensitivity increased to 0.86 -0.9, but the specificity fell to 0.55 -0.54. We recommend using the highest of these decision thresholds possible.
Though statistically small, this study has the advantage to our knowledge of being the largest published set of 19mer based shRNAs to date. In addition, unlike other shRNA studies that are necessarily skewed toward effective shRNAs, our study includes both functional and non-functional shRNAs. We have shown that modified Ui-Tei et al., Amarzguioui et al. and Takasaki et al. algorithms are fair to good predictive tools that distinguish effective from ineffective shRNAs. However, significant shortcomings still exist in the modified algorithms. A direct assessment of the algorithm modifications using shRNAs designed according to each original and modified algorithm would lend support to these findings. These algorithms are meant to reduce the number of false positive shRNAs selected, not completely eliminate them altogether, and thus this would require a large number of shRNAs to obtain a statistically significant difference in false positive rate. The availability of larger shRNA data sets should support the development of algorithms with improved sensitivity and specificity. Additionally, several software applications for siRNA oligonucleotide design that were not considered in this study may be of use in the design of shRNAs [16,[34][35][36]. Criteria for designing functional siRNA oligonucleotides remain controversial as evidenced by the large number of studies still being devised for siRNA design, and since we did not test these sequences as siRNAs it cannot be established whether the modification of these algorithms also applies in the context of siRNA oligonucleotides. shRNA has an added layer of complexity over siRNA oligonucleotides since the hairpin needs to be processed within the cell before entering the RISC complex. Moreover, selective pressure against the stable expression of shRNAs that are deleterious to cell growth would be expected to lend an additional constraint to the stable expression of certain shRNAs. Despite these complexities, our findings begin to bring insight into the ability to apply siRNA algorithms for design of functional shRNAs.

Conclusion
We have provided several important strategies that should facilitate the generation of effective shRNA vectors for gene knockdown in mammalian cells. The ability to produce wild-type and mutant shRNA vectors simultaneously using mixed oligonucleotide pairs provides an efficient method to generate a specific control vector with little added time or cost. This strategy should be particularly useful in generating specific controls in high throughput applications. Difficulty in sequencing through the high intrinsic secondary structure of some hairpin vectors also has presented a major constraint in the construction of shRNA vectors, and the knowledge that sequencing issues can be resolved by modifying BigDye chemistries and adding Betaine and other DNA relaxing agents should be valuable regardless of the method of shRNA design and construction. Using data from 27 shRNAs that we have constructed we have performed an analysis of the ability of published algorithms for siRNA oligonucleotide target selection to predict knockdown efficacy. Our results show that shRNA efficacy cannot strictly be explained by any of the six algorithms tested. We provide a modification, however, that greatly improves the predictability of the Ui-Tei et al., Amarzguioui et al. and Takasaki et al algorithms. Results were confirmed using data from 38 previously published shRNAs. These findings should be of significant applicability in the design and preparation of functional shRNAs.

Methods
Cell lines and cell culture THP1 monocytic cell and Jurkat T cell lines were cultured in RPMI, 10% FCS. Cultures were maintained between 2 and 8 × 10 5 cells/ml and standardized to equivalent densities before assessing knockdown efficiencies.

Plasmid design and construction
Retroviral vectors for shRNA expression have a pHSPG backbone [37] with an inserted H1 RNA promoter driving shRNA expression. The pHSPG vector also has a green fluorescent protein (GFP) gene driven by a phosphoglycerate kinase promoter as a marker. The H1 promoter and shRNA expression cassette were inserted into the pHSPG vector by one of two methods. In the first method, a double stranded oligomer is synthesized with Bgl II and Xho I half sites on the ends. This is prepared as either a matched pair or a wild-type/mutant hybrid (Fig. 1). To prepare wild-type and mutant shRNA vectors simultaneously, a forward strand oligomer is synthesized that contains the wild-type hairpin. In parallel, a mutant reverse strand with a one bp mismatch within the target sequence is also synthesized. Despite the mismatches between the forward wild-type and reverse mutant strands, annealing can still occur efficiently under optimized conditions. The ds oligonucleoltide is annealed by combining 1000 pmol of each oligomer strand in 50 µl of annealing buffer (100 mM potassium acetate, 30 mM HEPES-KOH, pH 7.4, 2 mM Mg-acetate). The mixture is boiled for five minutes and then cooled slowly to 4°C. The annealed double stranded oligomer is ligated into Bgl II and Xho I half sites 3' of the H1 promoter that is inserted into the 3' long terminal repeat (LTR) of pHSPG generating a self-inactivating LTR. The double stranded hybrid is ligated into the vector 5' of a pol III promoter and is transformed into competent bacteria. Since replication is semi-conservative, the daughter bacteria will be of two different populations that carry either a double-stranded wild-type or a double-stranded mutant vector. Bacteria carrying either wild-type or mutant vectors can then be isolated from individual colonies and sequenced. Oligos used for this method had the sequence: GATCCCC-N19-TTCAAGAGA-rN19-TTTTTGGAAA; and TCGATTTCCAAAAA-N19-TCTCTTGAA-rN19-GGG (where N19 is the sense of the target sequence and rN19 is the antisense). We have routinely used DH5α to prepare wild-type and mutant shRNA vectors with approximately equal yields of each type of vector; however, a repair-deficient E. coli mutant could theoretically improve the efficiency of simultaneous construction.
A second design involves PCR using a primer complementary to the 5' end of the H1 promoter together with an shRNA-specific long-primer whose 3' end is complementary to the 3' end of the H1 promoter. PCR is performed using Pfx polymerase with PCRx enhancer (this combination has proved essential for reducing the number of mutations introduced within the amplified region). Oligos used for this method were: GCGGCCGCGATATC-GAACGCTGACGTCATCAACCC (universal oligo); and TGCTCTAGAAAAA-N19-TCTCTTGAA-rN19-GGGAAA-GAGTGGTCTCATACAGAACTTATAAGATTCC, where N19 is the sense of the target sequence and rN19 is the antisense. Sequences complimentary to the H1 promoter are underlined. PCR fragments were digested with EcoRV and XbaI and ligated into the 3' LTR of pHSPG. All constructs were verified by sequencing. . Ratios of 20:1, 10:1, 5:1 and 3:1 BD:dGTP chemistries and straight dGTP chemistry were used. Additives evaluated in sequencing reactions were: 0.83 M Betaine (Sigma part # B-0300), 5% DMSO (Sigma part # D-2650), 1 × PCRx Enhancer (in Invitrogen kit part # 11495-017), 1 × (1 uL Thermofidelase/20 uL sequencing reaction) ThermoFidelase I (Fidelity Systems) and 10 × primer concentration. The thermal cycler protocol used for cycle sequencing was: 95'C for 3 minutes (or 5 minutes when using ThermoFidelase I) followed by 25 cycles of 98'C for 40 seconds (1 st cycle) or 10 seconds (subsequent cycles), 50'C for 5 seconds and 60'C for 4 minutes. Sequencing reactions were purified using Centri-Sep 96 well spin plates (Princeton Separations), and the purified reaction products were run on a 3730 DNA Analyzer (Applied Biosystems) with a 50 cm array using the LongRead protocol. As a measure of read through efficacy peak height ratios were determined about 300 bases after and 50 bases before the hairpin.

Virus preparation, transduction and cell sorting
To prepare virus, pHSPG-shRNA plasmids were co-transfected into 293T cells with gag/pol and VSVg vectors by the calcium phosphate method. Viral supernatants were collected 24 and 48 hours following transfection and used to transduce THP1 or Jurkat cells by spinoculation. THP1 cells were transduced with virus on two consecutive days to increase transduction levels. Following approximately one week of culture, stably transduced cells were isolated by sorting for GFP. FACS analysis studies suggest that GFP expression is 95% stable for at least two months following sorting (not shown).

RNA expression analyses
Total RNA was isolated with an RNeasy isolation kit (Qiagen) using the recommended protocol. To increase specificity, cDNA was reverse transcribed using oligo dT primer and Superscript III RT (GibcoBRL). Real-time PCR experiments were performed using an AB Prism 7700 instrument (Applied Biosystems) with 57°C annealing temperature. . Primers were designed to span exon/intron junctions where possible. All RNA expression analyses were done at least in triplicate for RNA isolated on different days and knockdowns were verified with at least one control hairpin. Values represent average observed knockdown for RNA from different days of cell culture and were standardized to 18s rRNA expression.

Implimentation of algorithms
The free energy (∆G) of RNA duplex formation for the 5 bases at the 5' end of the sense and antisense strands was determined using the thermodynamic parameters and expanded nearest-neighbor model of Xia et al. [38]. The 5' ∆∆G (differential free energy) was calculated by subtracting the ∆G of the antisense strand from that of the sense strand. Determination of scores for the Reynolds et al., Amarzgiuoui et al., and Takasaki et al. algorithms was as described [15,17,20]. The Hsieh et al. score represents the interpretation of the Hsieh et al. design criteria as published by Saetrom and Snove [16,19]. For the Ui-Tei algorithm sequences with a C or G on the 5' end scored 1 point, whereas those with an A or T scored -1 point. Sequences with an A or T on the 3' end scored 1 point, whereas those with a C or G scored -1 point. Sequences with 5 or more A or T bases in the seven 3' bases scored 2 points, whereas those with 4 A or T bases scored 1 point. Sequences can be classified by score as follows: 4 -class Ia, 3 -class Ib, 2, 1 or 0 -class II and -1 or -2 -class III. All knockdowns of <10% are graphed as 0. The scores for shRNAs with central duplex ∆Gs greater than -12.9 kcal/mol were left unchanged. The cutoff value of -12.9 kcal/mol was selected empirically based upon the range of central duplex ∆Gs for all shRNAs (see Table 4).

ROC curve analysis
ROC curves were constructed as described [39]. ROC analysis requires that each shRNA is classified as either effective or ineffective. For our analyses, a shRNA was classified as effective if it reduced mRNA expression by 50% or more. A ROC curve was generated for each algorithm as follows. The decision threshold was set to one unit below the lowest shRNA score. By definition shRNAs with scores greater than or equal to the decision threshold were predicted to be effective, while those with scores less than the decision threshold were predicted to be ineffective. Then each shRNA was classified as a true positive (effective predicted to be effective), a false negative (effective predicted to be ineffective), a true negative (ineffective predicted to be ineffective) or a false positive (ineffective predicted to be effective). The true positive fraction (TPF) for the decision threshold was calculated as the number of true positives divided by the sum of the true positives and false negatives. The false positive fraction (FPF) was calculated as the number of false positives divided by the sum of the false positives and true negatives. The decision threshold was increased by one unit and the TPF and FPF calculated again. This process was repeated until the decision threshold was one unit greater than the highest scoring shRNA. ROC curves were constructed by plotting TPF versus the FPF for all decision thresholds. The area under the ROC curve was estimated by integration using the trapezoid rule.