to BioTechniques free email alert service to receive content updates.
Construction and characterization of a normalized yeast two-hybrid library derived from a human protein-coding clone collection
Jorja Degrado-Warren, Max Dufford, Jian Chen, Paul L. Bartel, Donna Shattuck, Georges C. Frech
Full Text (PDF)

DNA from both processes was combined at an estimated molar ratio 20:1 in favor of material from the random fragment procedure, so that for each full-length insert, there would be approximately 20 random fragments of that same insert represented in the library (DNA derived from sonication and CviJI digestion was combined at an equal weight ratio). Since the <1 kb fraction was not included within the random fragment material, additional DNA from the full-length <1 kb fraction was added to the final pool to ensure equimolar representation of all clones. A large-scale cotransformation of this final DNA pool together with linearized AD vector DNA into yeast generated our RAFL Y2H library. The incubation time before harvesting the yeast colonies was kept as short as possible to minimize loss of normalization of the library due to differential colony growth rates.

Our library construction protocol can easily be tailored to conform to any other clone collection one would like to use as input material. The only parameter that needs to be adapted accordingly is the identity of the vector-specific PCR primer pair(s). In the future, as additional protein-coding clones become available, it is easy to combine the new clones with the previously amplified clones and produce new library(s) of increasing complexity. Our procedure is also compatible with any Y2H prey vector of choice, by simply adapting the sequences of the tailed PCR primers so they match with the prey vector for efficient in vivo homologous recombination to occur (27). Furthermore, this library construction method also lends itself to non-nuclear two-hybrid systems, such as the membrane two-hybrid system (33,34). For that system, the use of clones/libraries that express full-length proteins is generally essential, and our protocol would be simplified by requiring only the full-length process.

Library Characterization

By analyzing a fraction of each separate PCR by agarose gel electrophoresis, we kept track of the PCR success rate. After optimization of the PCR parameters (see the Materials and Methods section), <5% of the wells failed to give rise to a visible PCR product as assessed by ethidium bromide staining. PCR bands were visually inspected and assigned as exhibiting strong (84% of reactions), weak (10% of reactions), or very weak (2% of reactions) staining intensity. Based on a PCR success rate of 95%, we estimate that approximately 10,000 different genes are represented in the RAFL library.

A total of 96 yeast transformants were randomly selected and assayed for insert size by colony PCR with AD vector-specific primers. None of the clones gave rise to a band size indicative of clones that contain no insert. Two of the 96 clones repeatedly failed to produce a PCR product, possibly due to the presence of longer inserts. The variation of insert sizes among the 94 clones ranged from 0.2 to 2.5 kb, with an average insert size of 0.8 kb.

A larger number of yeast transformants were randomly selected for DNA sequence analysis. Prey insert sequences were determined for 356 clones: 353 sequences matched 338 different human protein-coding genes, one clone encoded a human genomic sequence with no apparent ORF, and two sequences matched bacterial genes. As shown in Figure 2, 11 genes were observed more than once (between two to four times). Of these 11 genes, 7 were represented in the MGC plates more than once (between 2 and 13 wells/gene). In conclusion, no gene was found to be highly overrepresented, indicating that the library construction procedure did not result in any dramatic departure from normalization. In comparison, sequencing of randomly picked colonies from cDNA libraries derived from double poly(A)-selected messenger RNA (mRNA) typically reveals abundantly expressed genes being overrepresented and a significant fraction of clones that do not match protein-coding sequences (data not shown). This observation is consistent with recent studies that reveal significant fractions of the human genome being transcribed, including into poly(A)-tailed RNAs that lack apparent coding function (35,36).

Library Test Screens and Retrieval of Interactors

To test the performance of the RAFL Y2H library, we performed Y2H screens using 24 baits derived from 10 different human proteins. Selection of these test baits was based on the following criteria: (i) each protein was to be represented by two or three baits with overlapping coordinates; (ii) all baits had to have been searched previously against several cDNA-derived AD libraries, in which the overlapping baits derived from the same protein had retrieved at least one common interactor; and (iii) at least one bait from each set of overlapping baits had previously retrieved interactor(s) that were expected to be represented in the new library. These conditions allowed selection of baits that we knew were not self-activating (37) and were likely to form correctly folded fusion proteins, due to the fact that multiple overlapping baits interacted with identical prey(s). Furthermore, this experiment was designed to explore if the RAFL library could yield novel interactions not identified in previous screens and determine what fraction of "expected" interactions could be reproduced. An expected interactor was defined as a protein previously identified as an interactor in an Y2H experiment that was likely to be represented in the RAFL library because its gene was included on the 183 MGC plates and a visible band was observed after PCR amplification. All baits were screened twice against the RAFL library. The identities and amino acid coordinates of the 24 baits are listed in (Table 2).

Table 2. Baits Used for Yeast Two-hybrid Test Screens

The results obtained from the Y2H screens are summarized in (Table 3). We report a total of 31 prey interactors that were retrieved from the new library, of which 17 had not been retrieved from cDNA libraries (indicated as n in (Table 3)), and the remaining 14 were expected interactors (indicated as e+ in (Table 3)). The vast majority of interactors were obtained from more than one search, with only three interactors identified from a single search (indicated as 1s in (Table 3)). There were 12 interactions that we expected to find that were not retrieved from the RAFL library. Five of these had only been previously retrieved as solitary clones from a single cDNA library search (singletons). Since, in the absence of any corroborating evidence, Y2H singletons generally are more likely to represent false positives, and since we found no supporting evidence for any of these five interactions in the public domain, they have not been included in (Table 3). As a comparison, of the 14 expected interactors that were retrieved from the new library, only 2 had previously been retrieved as singletons. Another 6 missed expected interactors were not singletons, and one missed singleton represented a published interactor. Clones derived from these 7 genes (indicated as e- in (Table 3)) may be underrepresented in the RAFL library. As described above, several measures were taken to minimize loss of normalization during library construction. Nevertheless, there are several steps in which introduction of bias could not entirely be avoided—in particular the two steps that involve PCR amplification as well as the yeast growth step prior to harvesting of the freshly prepared library. For all interactions listed in (Table 3), multiple prey clones with overlapping coordinates were obtained, and 14 of the 38 interactions have independent support from the public domain (indicated as lit or lit* in (Table 3)). Therefore, this list represents a high confidence set of interactions.

Table 3. Interaction Data from Yeast Two-hybrid Screens

Every row identifies one interaction pair. The Bait IDs in the left-most column refer to the baits as defined in (Table 2). The abbreviations listed in the right-most column are defined as follows: e+, expected and found, the interactor was retrieved from cDNA and random and full-length (RAFL) libraries; e−, expected not found, the interactor was only retrieved from cDNA library(s); n, new, the interactor was only retrieved from the RAFL library; lit, the interaction has been identified and reported independently; lit*, interaction with orthologous or paralogous protein partners has been identified and reported independently; 1s, identified in a single search. Interactions that are available in the public domain can be accessed via the National Center for Biotechnology Information (NCBI) EntrezGene portal ( = gene). All 38 listed interactions have been submitted to The International Molecular Exchange (IMEx) consortium ( via the Database of Interacting Proteins (DIP) and have been assigned IMEx identifiers IM-8753 to IM-8792.

The library construction strategy we used is not compatible with directional cloning, so half of all inserts are cloned in the incorrect orientation. It is therefore straightforward to identify search results that represent random noise, because these are the searches with equal representation of clones in either orientation. But even searches with good interactions can present with a background of random noise. We observed a background of singletons in about one-half of our searches with the 24 test baits (data not shown). On average, about one-third of the singletons were in the incorrect orientation, indicating that the majority of singletons do not represent valid interactions. Ignoring singletons retrieved from our normalized clone-derived library is likely to reduce false positives at the expense of slightly increasing the rate of false negatives in some searches. In comparison, weighing the balance between false positives and false negatives is more difficult for singletons retrieved from cDNA libraries, and these cannot be dismissed as easily. Singletons retrieved from non-normalized cDNA libraries are more frequent occurrences, due to the fact that many genes are represented at low abundance.

In conclusion, our RAFL Y2H library provides a useful complement to our existing suite of cDNA libraries. The new library performs qualitatively as predicted, retrieving expected interactors and, more importantly, interactors not previously obtained from cDNA library searches. Furthermore, the use of clone-derived normalized Y2H libraries can be expected to reduce the number of false positives that are reported. With the continued expansion of protein-coding clone collections, the usefulness of such Y2H libraries is poised to increase, and these types of clone-derived Y2H libraries can be expected to eventually replace cDNA libraries.


We are grateful to past and present members of the Myriad ProNet team (Dan Cimbora, Monica Cronin, Heather Cummings, Christina Davenport, Ryan Doering, Melinda Jones, Kim Mauck, Chris Neff, Jonathan Nelson, Jimmy Park, Scott Patton, Todd Peterson, Michael Rector, Rosann Robinson, Sheila Towne, Daniel Wettstein, and Wei Xiong) for preparing DNA of the MGC clones and/or performing yeast two-hybrid searches and interaction confirmations, to Yang Chen for critical reading of this manuscript, to Mark McKellar, Andrew Morris, Brian Morris, Yuan Wan, and Linda Wong for bioinformatics support, to Natalia Gutin and Jeff Mitchell for robot programming support, and to members of Myriad's DNA Sequencing core facility.

Competing Interests Statement

All authors are employed by and have financial interest with Myriad Genetics, Inc.

1.) Brent R. Ptashne M., A eukaryotic transcriptional activator bearing the DNA specificity of a prokaryotic repressor, Cell, P729 - P736

2.) Ma J. Ptashne M., Deletion analysis of GAL4 defines two transcriptional activating segments, Cell, P847 - P853

3.) Fields S. Song O., A novel genetic system to detect protein-protein interactions, Nature, P245 - P246

4.) Yang M. Wu Z. Fields S., Protein-peptide interactions analyzed with the yeast two-hybrid system, Nucleic Acids Res., P1152 - P1156

5.) Ito T. Tashiro K. Muta S. Ozawa R. Chiba T. Nishizawa M. Yamamoto K. Kuhara S. Sakaki Y., Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins, Proc. Natl. Acad. Sci USA, P1143 - P1147

6.) Uetz P. Giot L. Cagney G. Mansfield A. T. Judson S. R. Knight R. J. Lockshon D. Narayan V., A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae, Nature, P623 - P627

7.) Rain C. J. Selig L. De Reuse H. Battaglia V. Reverdy C. Simon S. Lenzen G. Petel F., The protein-protein interaction map of Helicobacter pylori, Nature, P211 - P215

8.) Walhout J. A. Vidal M., High-throughput yeast two-hybrid assays for largescale protein interaction mapping, Methods, P297 - P306

9.) Legrain P. Selig L., Genome-wide protein interaction maps using two-hybrid systems, FEBS Lett., P32 - P36

10.) Hudson R. J. Dawson P. E. Rushing L. K. Jackson H. C. Lockshon D. Conover D. Lanciault C. Harris R. J., The complete set of predicted genes from Saccharomyces cerevisiae in a readily usable form, Genome Res., P1169 - P1173

11.) Temple G. Lamesch P. Milstein S. Hill E. D. Wagner L. Moore T. Vidal M., From genome to proteome: developing expression clone resources for the human genome, Hum. Mol. Genet., PR31 - PR43

12.) Lamesch P. Li N. Milstein S. Fan C. Hao T. Szabo G. Hu Z. Venkatesan K., hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes, Genomics, P307 - P315

13.) Zhong J. Zhang H. Stanyon A. C. Tromp G. Finley L. R., A strategy for constructing large protein interaction maps using the yeast two-hybrid system: regulated expression arrays and two-phase mating, Genome Res., P2691 - P2699

14.) Rual F. J. Venkatesan K. Hao T. Hirozane-Kishikawa T. Dricot A. Li N. Berriz F. G. Gibbons D. F., Towards a proteome-scale map of the human protein-protein interaction network, Nature, P1173 - P1178

15.) Stelzl U. Worm U. Lalowski M. Haenig C. Brembeck H. F. Goehler H. Stroedicke M. Zenkner M., A human protein-protein interaction network: a resource for annotating the proteome, Cell, P957 - P968

16.) Jin F. Avramova L. Huang J. Hazbun T., A yeast two-hybrid smart-pool-array system for protein-interaction mapping, Nat. Methods, P405 - P407

17.) Flajolet M. Rotondo G. Daviet L. Bergametti F. Inchauspe G. Tiollais P. Transy C. Legrain P., A genomic approach of the hepatitis C virus generates a protein interaction map, Gene, P369 - P379

18.) Fromont-Racine M. Rain C. J. Legrain P., Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens, Nat. Genet., P277 - P282

19.) Formstecher E. Aresta S. Collura V. Hamburger A. Meil A. Trehin A. Reverdy C. Betin V., Protein interaction mapping: a Drosophila case study, Genome Res., P376 - P384

20.) Nakayama M. Kikuno R. Ohara O., Protein-protein interactions between large proteins: two-hybrid screening using a functionally classified library composed of long cDNAs, Genome Res., P1773 - P1784

21.) Li S. Armstrong M. C. Bertin N. Ge H. Milstein S. Boxem M. Vidalain O. P. Han D. J., A map of the interactome network of the metazoan C. elegans, Science, P540 - P543

22.) Bartel L. P. Roecklein A. J. SenGupta D. Fields S., A protein linkage map of Escherichia coli bacteriophage T7, Nat. Genet., P72 - P77

23.) Pennisi E., Working the (gene count) numbers: finally, a firm answer?, Science, P1113

24.) Gerhard S. D. Wagner L. Feingold A. E. Shenmen M. C. Grouse H. L. Schuler G. Klein L. S. Old S., The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC), Genome Res., P2121 - P2127

25.) Sambrook J. Russell D.W., Molecular Cloning: A Laboratory Manual, 3rd ed, CSH Laboratory Press, Cold Spring Harbor

26.) Fitzgerald C. M. Skowron P. Van Etten L. J. Smith M. L. Mead A. D., Rapid shotgun cloning utilizing the two base recognition endonuclease CviJI, Nucleic Acids Res., P3753 - P3762

27.) Fusco C. Guidotti E. Zervos S. A., In vivo construction of cDNA libraries for use in the yeast two-hybrid system, Yeast, P715 - P720

28.) Garrus E. J. von Schwedler K. U. Pornillos W. O. Morham G. S. Zavitz H. K. Wang E. H. Wettstein A. D. Stray M. K., Tsg101 and the vacuolar protein sorting pathway are essential for HIV-1 budding, Cell, P55 - P65

29.) Burke D. Dawson D. Stearns T., Methods in Yeast Genetics: A Cold Spring Harbor Laboratory Course Manual, CSH Laboratory Press, Cold Spring Harbor

30.) Bartel P.L. Fields S., The Yeast Two-Hybrid System, Oxford University Press, New York

31.) Strausberg L. R. Feingold A. E. Grouse H. L. Derge G. J. Klausner D. R. Collins S. F. Wagner L. Shenmen M. C., Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl. Acad. Sci, USA, P16899 - P16903

32.) Strausberg L. R. Feingold A. E. Klausner D. R. Collins S. F., The mammalian gene collection, Science, P455 - P457

33.) Stagljar I. Fields S., Analysis of membrane protein interactions using yeast-based technologies, Trends Biochem. Sci., P559 - P563

34.) Miller P. J. Lo S. R. Ben-Hur A. Desmarais C. Stagljar I. Noble S. W. Fields S., Large-scale identification of yeast integral membrane protein interactions, Proc. Natl. Acad. Sci. USA, P12123 - P12128

35.) Kapranov P. Cheng J. Dike S. Nix A. D. Duttagupta R. Willingham T. A. Stadler F. P. Hertel J., RNA maps reveal new RNA classes and a possible function for pervasive transcription, Science, P1484 - P1488

36.) ENCODE Project Consortium, Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project, Nature, P799 - P816

37.) Vidal M. Legrain P., Yeast forward and reverse'n'-hybrid systems, Nucleic Acids Res., P919 - P929

  1    2    3    4    5