DNA from both processes was combined at an estimated molar ratio 20:1 in favor of material from the random fragment procedure, so that for each full-length insert, there would be approximately 20 random fragments of that same insert represented in the library (DNA derived from sonication and CviJI digestion was combined at an equal weight ratio). Since the <1 kb fraction was not included within the random fragment material, additional DNA from the full-length <1 kb fraction was added to the final pool to ensure equimolar representation of all clones. A large-scale cotransformation of this final DNA pool together with linearized AD vector DNA into yeast generated our RAFL Y2H library. The incubation time before harvesting the yeast colonies was kept as short as possible to minimize loss of normalization of the library due to differential colony growth rates.
Our library construction protocol can easily be tailored to conform to any other clone collection one would like to use as input material. The only parameter that needs to be adapted accordingly is the identity of the vector-specific PCR primer pair(s). In the future, as additional protein-coding clones become available, it is easy to combine the new clones with the previously amplified clones and produce new library(s) of increasing complexity. Our procedure is also compatible with any Y2H prey vector of choice, by simply adapting the sequences of the tailed PCR primers so they match with the prey vector for efficient in vivo homologous recombination to occur (27). Furthermore, this library construction method also lends itself to non-nuclear two-hybrid systems, such as the membrane two-hybrid system (33,34). For that system, the use of clones/libraries that express full-length proteins is generally essential, and our protocol would be simplified by requiring only the full-length process.Library Characterization
By analyzing a fraction of each separate PCR by agarose gel electrophoresis, we kept track of the PCR success rate. After optimization of the PCR parameters (see the Materials and Methods section), <5% of the wells failed to give rise to a visible PCR product as assessed by ethidium bromide staining. PCR bands were visually inspected and assigned as exhibiting strong (84% of reactions), weak (10% of reactions), or very weak (2% of reactions) staining intensity. Based on a PCR success rate of 95%, we estimate that approximately 10,000 different genes are represented in the RAFL library.
A total of 96 yeast transformants were randomly selected and assayed for insert size by colony PCR with AD vector-specific primers. None of the clones gave rise to a band size indicative of clones that contain no insert. Two of the 96 clones repeatedly failed to produce a PCR product, possibly due to the presence of longer inserts. The variation of insert sizes among the 94 clones ranged from 0.2 to 2.5 kb, with an average insert size of 0.8 kb.
A larger number of yeast transformants were randomly selected for DNA sequence analysis. Prey insert sequences were determined for 356 clones: 353 sequences matched 338 different human protein-coding genes, one clone encoded a human genomic sequence with no apparent ORF, and two sequences matched bacterial genes. As shown in Figure 2, 11 genes were observed more than once (between two to four times). Of these 11 genes, 7 were represented in the MGC plates more than once (between 2 and 13 wells/gene). In conclusion, no gene was found to be highly overrepresented, indicating that the library construction procedure did not result in any dramatic departure from normalization. In comparison, sequencing of randomly picked colonies from cDNA libraries derived from double poly(A)-selected messenger RNA (mRNA) typically reveals abundantly expressed genes being overrepresented and a significant fraction of clones that do not match protein-coding sequences (data not shown). This observation is consistent with recent studies that reveal significant fractions of the human genome being transcribed, including into poly(A)-tailed RNAs that lack apparent coding function (35,36).Library Test Screens and Retrieval of Interactors
To test the performance of the RAFL Y2H library, we performed Y2H screens using 24 baits derived from 10 different human proteins. Selection of these test baits was based on the following criteria: (i) each protein was to be represented by two or three baits with overlapping coordinates; (ii) all baits had to have been searched previously against several cDNA-derived AD libraries, in which the overlapping baits derived from the same protein had retrieved at least one common interactor; and (iii) at least one bait from each set of overlapping baits had previously retrieved interactor(s) that were expected to be represented in the new library. These conditions allowed selection of baits that we knew were not self-activating (37) and were likely to form correctly folded fusion proteins, due to the fact that multiple overlapping baits interacted with identical prey(s). Furthermore, this experiment was designed to explore if the RAFL library could yield novel interactions not identified in previous screens and determine what fraction of "expected" interactions could be reproduced. An expected interactor was defined as a protein previously identified as an interactor in an Y2H experiment that was likely to be represented in the RAFL library because its gene was included on the 183 MGC plates and a visible band was observed after PCR amplification. All baits were screened twice against the RAFL library. The identities and amino acid coordinates of the 24 baits are listed in (Table 2).Table 2. Baits Used for Yeast Two-hybrid Test Screens
The results obtained from the Y2H screens are summarized in (Table 3). We report a total of 31 prey interactors that were retrieved from the new library, of which 17 had not been retrieved from cDNA libraries (indicated as n in (Table 3)), and the remaining 14 were expected interactors (indicated as e+ in (Table 3)). The vast majority of interactors were obtained from more than one search, with only three interactors identified from a single search (indicated as 1s in (Table 3)). There were 12 interactions that we expected to find that were not retrieved from the RAFL library. Five of these had only been previously retrieved as solitary clones from a single cDNA library search (singletons). Since, in the absence of any corroborating evidence, Y2H singletons generally are more likely to represent false positives, and since we found no supporting evidence for any of these five interactions in the public domain, they have not been included in (Table 3). As a comparison, of the 14 expected interactors that were retrieved from the new library, only 2 had previously been retrieved as singletons. Another 6 missed expected interactors were not singletons, and one missed singleton represented a published interactor. Clones derived from these 7 genes (indicated as e- in (Table 3)) may be underrepresented in the RAFL library. As described above, several measures were taken to minimize loss of normalization during library construction. Nevertheless, there are several steps in which introduction of bias could not entirely be avoided—in particular the two steps that involve PCR amplification as well as the yeast growth step prior to harvesting of the freshly prepared library. For all interactions listed in (Table 3), multiple prey clones with overlapping coordinates were obtained, and 14 of the 38 interactions have independent support from the public domain (indicated as lit or lit* in (Table 3)). Therefore, this list represents a high confidence set of interactions.Table 3. Interaction Data from Yeast Two-hybrid Screens
Every row identifies one interaction pair. The Bait IDs in the left-most column refer to the baits as defined in (Table 2). The abbreviations listed in the right-most column are defined as follows: e+, expected and found, the interactor was retrieved from cDNA and random and full-length (RAFL) libraries; e−, expected not found, the interactor was only retrieved from cDNA library(s); n, new, the interactor was only retrieved from the RAFL library; lit, the interaction has been identified and reported independently; lit*, interaction with orthologous or paralogous protein partners has been identified and reported independently; 1s, identified in a single search. Interactions that are available in the public domain can be accessed via the National Center for Biotechnology Information (NCBI) EntrezGene portal (
The library construction strategy we used is not compatible with directional cloning, so half of all inserts are cloned in the incorrect orientation. It is therefore straightforward to identify search results that represent random noise, because these are the searches with equal representation of clones in either orientation. But even searches with good interactions can present with a background of random noise. We observed a background of singletons in about one-half of our searches with the 24 test baits (data not shown). On average, about one-third of the singletons were in the incorrect orientation, indicating that the majority of singletons do not represent valid interactions. Ignoring singletons retrieved from our normalized clone-derived library is likely to reduce false positives at the expense of slightly increasing the rate of false negatives in some searches. In comparison, weighing the balance between false positives and false negatives is more difficult for singletons retrieved from cDNA libraries, and these cannot be dismissed as easily. Singletons retrieved from non-normalized cDNA libraries are more frequent occurrences, due to the fact that many genes are represented at low abundance.
In conclusion, our RAFL Y2H library provides a useful complement to our existing suite of cDNA libraries. The new library performs qualitatively as predicted, retrieving expected interactors and, more importantly, interactors not previously obtained from cDNA library searches. Furthermore, the use of clone-derived normalized Y2H libraries can be expected to reduce the number of false positives that are reported. With the continued expansion of protein-coding clone collections, the usefulness of such Y2H libraries is poised to increase, and these types of clone-derived Y2H libraries can be expected to eventually replace cDNA libraries.