The utility of the small-intelligent strategy for constructing two-site small-intelligent libraries was also tested when coupled with an overlapping PCR approach. Two active-site residues in HheA (V136/L141) and in HheC (P135/F136) were randomized, as case studies. For these, two sets of overlapping single-stranded primers with one mutation per set (eight oligonucleotides in total for the two nonadjacent sites V136/L141 of HheA) and (17 oligonucleotides in total for the two contiguous sites P135/F136 HheC) were synthesized (Figure 1, Step II, Option B). To evaluate the quality of the constructed libraries, 60 randomly picked colonies were sequenced (Figure 2, E and F). Again, the results showed that all colonies carried mutations at the target sites, and no rare codons or stop codons were obtained in the two cases. Moreover, among the sequenced colonies, less than 10 identical variants were obtained, suggesting proper priming of the designed small-intelligent primers. It was noticed that the amino acid Glu was not detected in the HheA V136/L141 library (Figure 2E), while this was not the case for the other two small-intelligent libraries (Figure 2, D and F), suggesting that the residual bias observed in Figure 2E might be a random effect. Thus, the results demonstrated that the small-intelligent system could be applied in randomizing two sites as well.Software development for assisting in degenerate primer design
The design of an optimal degenerate primer set is the key point of the small-intelligent system. Several computational programs, such as LibDesign and AA-Calculator, have been developed for designing a set of degenerate codons that could reduce some codon redundancy in saturation mutagenesis libraries (24,25). However, they do not exactly meet the criteria for small-intelligent primer design.
For example, in AA-Calculator, codon redundancy, stop codons, and the rare codons of host cells could not be completely eliminated from the resulting library in most cases, which could affect the quality of the constructed library.
In this work, we developed a software tool called DC-Analyzer for the design of small-intelligent primers. For one degenerate codon XYZ (X, Y, Z, represent one of the 15 degenerate nucleotides; see Table 1), there are 153 possible variants. After eliminating the three stop codons and eight rare codons of E. coli, the remaining 1279 possible variants can be used for small-intelligent primer design. DC-Analyzer carries out iterations of exploration over the number of degenerate codons. In each round k, DC-Analyzer exhaustively adds one of 1279 variants of XYZ to all the former schemes (XYZ)1 + … + (XYZ)k-1 produced in the previous round without repetition and generates all possible schemes (XYZ)1 + … + (XYZ)k for this round. Each scheme (XYZ)1 + … + (XYZ)k contains k degenerate codons and maps to a tuple of (outk, fqk, rak, nak), where outk is 0 when the scheme encodes some undesired amino acids, and is 1 otherwise; fqk is 0 when the frequency of all the codons being covered by the scheme is not the same, and is 1 otherwise; nak is the number of the distinct amino acids that can be encoded by the scheme; rak is the ratio of nak to the number of codons being covered by this scheme. DC-Analyzer ranks all the generated schemes according to their corresponding tuples in lexicographical order, such that larger is the better.
To make DC-Analyzer more efficient, after the ranking, DC-Analyzer chooses to preserve at most the top 2000 best current schemes for the next round of exploration. This means that the number of possible schemes being generated is significantly reduced. Moreover, to eliminate the repetition of schemes, DC-Analyzer numerates all of 1279 degenerate codons in some order and requires that the order number of degenerate codons being added to a scheme should be strictly increasing. The order of degenerate codons can be described as follows: the order number of (XYZ)i is less than that of (XYZ)j, if (XYZ)i contains a superset of codons contained in (XYZ)j, or contains more codons than (XYZ)j. DC-Analyzer was proven to allow the exploration of the best schemes for encoding desired amino acids with the minimal degeneracy in around 20 s on a single core CPU (T2300, 1.66 GHz) of the Dell Inspiron 640 m computer.