Protein functional analysis in the post-genomic era is a huge task that has to be approached by different methods in parallel. The use of protein-specific antibodies in conjunction with tissue microarrays has proven to be one important technology. In this study, we present a strategy for the optimized design of protein subfragments for subsequent antibody production. The fragments are selected based on a principle of lowest sequence similarity to other human proteins, optimally to generate antibodies with high selectivity. Furthermore, the fragments should have properties optimized for efficient protein production in Escherichia coli. The strategy has been implemented in Bishop, which is a Java-based software enabling the high-throughput production of protein fragments. Bishop allows for the avoidance of certain restriction enzyme sites, transmembrane regions, and signal peptides. A Basic Local Alignment Search Tool (BLAST) scanning procedure permits the selection of fragments of a selected size with a minimal sequence similarity to other proteins. The software and the strategy were evaluated on a human test data set and verified to fulfill the requested criteria.
A variety of technologies for studies of proteins on a proteomic scale are now available. The principal for analysis as well as the characteristics of the results differ dramatically. Much effort is being put into protein expression analysis by different separation techniques coupled to mass spectrometry (1,2). Two-hybrid screens (3) are used for studies of protein-protein interactions for functional characterization of proteins, and approaches for large-scale determination of protein structures have been developed (4). Another principally different approach is epitope tagging of proteins (5) for subcellular localization using monoclonal antibodies.
Recently, an approach to apply localization studies of human proteins in cells and tissues was presented (www.hpr.se). This antibody proteomics technology involves the large-scale production of monospecific polyclonal antibodies for the cellular and subcellular localization of human proteins in tissue sections and cell cultures (6). By using tissue arrays (7), proteins can be localized in a large number of tissues, cell cultures, and disease states. Images of the tissues are stored in a protein atlas, which offers a powerful tool for the functional determination of proteins. An important part of the antibody proteomics technology is to design subfragments of proteins, which will be amplified by reverse transcription PCR (RT-PCR) and cloned into expression vectors. These subfragments, hereafter called protein epitope signature tags (PrESTs), are optimized for production in Escherichia coli and are expected to represent unique protein epitopes.
The scope of this work was to implement a strategy for the optimal design of PrESTs and to develop user-friendly software (Bishop), allowing high-throughput production of the protein fragments. The strategy was evaluated on a human test data set. We also present results from a study examing the potential of doing experiments on mouse using antibodies derived from human-mouse orthologs.Materials and Methods
The strategy for the PrEST design is based on a set of rules that were developed for optimal protein production and purification as well as for ensuring a good immune response. The first rule was to avoid transmembrane regions. It has been reported that these regions are difficult to express in E. coli, and severe problems also occur in the refolding and purification process (8). In addition, transmembrane regions are not easily accessible by antibodies in immunolocalization studies. The second rule was to avoid signal peptides, since they are cleaved off during translocation and will therefore be unsuitable as epitopes. The third rule was to design a protein fragment of suitable size, small enough for easy handling by PCR and cloning and large enough to provide conformational epitopes. The optimal size is usually 100–150 amino acids, and for shorter gene products, the complete protein would be selected. The fourth rule was to make the amino acid sequence of the fragments as unique as possible (compared to other human proteins) to avoid cross-reactivity by the generated antibodies.
The software Bishop was developed under Linux using Java. The BioJava API (www.biojava.org) was used for sequence handling and visualization. The relational database PostgreSQL (9) was used for storage of input data and results. As seen below, several Linux programs were called for specific analyses by the main program. A demo of the software and additional information can be obtained online at biobase.biotech.kth.se/bishop.PrEST Selection Steps
Bishop enables the user to perform a number of analyses of the selected sequence. When the correct parameters are set, the analyses are performed automatically or by manual execution of the steps below.Identification of restriction enzyme sites
The user can select any restriction enzyme to be displayed. It is important to avoid the design of PrESTs in regions containing restriction enzymes used in subsequent cloning.Prediction of signal peptides and transmembrane regions
Transmembrane regions are predicted by launching TMHMM software (10), and signal peptides are determined using SignalP (11). Both of these programs run under Linux, and the results are integrated and visualized in Bishop.Scanning for unique regions
One of the most important criteria for the PrEST design is to find fragments that have the lowest sequence similarity as possible to other human proteins. To find such regions, a scanning procedure based on BLASTP (12) was developed. A sliding window of fixed size along the protein is compared to all other proteins in the Ensembl database (www.ensembl.org). The score and E-value for the strongest sequence similarity at each step is then recorded ((Figure 1)). The window size can be set to any value, and the window step is one residue between each BLASTP. Before the scan, there is an option to remove sequences with more than 95% sequence identity from the Basic Local Alignment Search Tool (BLAST) database. This will ensure an informative scan curve and indicate that special consideration has to be taken regarding the possible role of the removed proteins.