2, Indiana University Research and Technology Corporation, Bloomington, Indiana, USA
The analysis of insoluble proteins represents a major technical challenge for the field of proteomics. For example, membrane proteins are often insoluble in common solvents and represent 20–30% of the proteins encoded by the human genome. Chemical analysis on an individual basis is often required and is laborious and time-consuming. This review presents an overview of methods for purification of expressed proteins using fusion tags as well as methods for analysis of insoluble proteins by mass spectrometry with a goal of achieving high-throughput analysis.
The human genome is estimated to encode 28,913 distinct proteins (1). The research effort to identify, characterize, and analyze the proteins encoded by the human genome, as well as that of other organisms, represents the field of proteomics. The complexity of this task is increased by the need to study proteins at the level of cells and tissues, as well as at the organismal level. A variety of techniques are employed in this effort, including both computational methods and biochemical methods. There have been many technical advances in the field such as two-dimensional (2-D) gels, mass spectrometry (MS), and robotics. However, a significant technical barrier remains in that many proteins are frequently insoluble in common solvents. For example, some proteins such as membrane proteins can be insoluble because they are hydrophobic. Moreover, misfolded proteins have exposed hydrophobic regions and can form insoluble aggregates. Many recombinant proteins, when overexpressed in a heterologous host, become insoluble because of misfolding.
So why study insoluble proteins? Misfolded proteins can cause diseases such as the amyloid-β plaques in Alzheimer's disease (2). Moreover, many proteins such as membrane proteins, for example, are poorly soluble or entirely insoluble if extracted from their native environment. It has been estimated that 20–30% of the human genome encodes membrane proteins (3,4) but less than 1% of the proteins of known structure are membrane proteins (5). Membrane proteins include receptor proteins and ion channels which represent important potential targets for therapeutics. An example from the field of neurobiology is the synapse, which consists of more than 2000 proteins (6). At even higher complexity are tissue samples, which contain a huge array of compounds, many of which are insoluble. These samples must retain their 3-D structure during analysis because of the importance of elucidating the location of particular molecular species within small areas of tissue (an example are specific brain centers, which can be crucial for assessing biological function).
The experimental study of insoluble proteins presents a challenge to the field of proteomics. While a number of powerful techniques have been developed for protein profiling, it is still necessary to purify individual proteins. Moreover, given the unique properties of each protein, it is still necessary to work out individual conditions on a case-by-case basis. For example, a thorough understanding of an individual protein requires the determination of its crystal structure, which often requires expression of recombinant proteins in order to obtain a sufficient quantity of purified protein for crystallization. Similarly, milligram quantities of a purified protein are required to generate antibodies for use in subsequent analyses such as antibody arrays; these quantities are also dependent upon recombinant protein expression and purification. This review presents an overview of available methods useful in the study of insoluble cellular proteins, as well as in the expression and purification of insoluble recombinant proteins.Expression and purification of recombinant proteins
Just as the best method for protein purification must be determined on an individual basis, the best method for expression and purification of recombinant proteins must also be determined on an individual basis. Expression systems have been created for numerous hosts including E. coli, yeast, insect, and mammalian cells. However, it is often not clear which host will produce a sufficient yield of expressed, soluble protein. In many cases it is necessary to try various hosts and expression systems. This task has been greatly facilitated by the introduction of recombination-based cloning systems, which are available for multiple host cell types (7). Once a gene encoding a protein of interest has been cloned into the first vector, the gene may then be transferred by recombination into multiple additional vectors without the need for recloning. Although each expression host offers particular advantages, the first choice for simplicity and yield is E. coli. However, when heterologous proteins are expressed in E. coli or other expression hosts, they often form insoluble inclusion bodies. Proteins can be purified from inclusion bodies, but refolding the protein is required and proteins have variable refolding properties (8).
A valuable tool for recombinant protein expression and purification is the use of epitope tagging in which a polypeptide tag is fused to the target protein in a suitable expression vector. Again, the process is facilitated through the use of recombination-based cloning vectors as numerous choices of vectors and epitope tags are available (9). The expressed protein may then be purified using a ligand with affinity for the epitope tag. One of the first epitope tags to prove useful for purifying recombinant proteins was the His tag (10). His-tagged proteins can be purified by immobilized metal affinity chromatography and the method is compatible with chaotropic agents (urea or guanidine hydrochloride) so that proteins from inclusion bodies can be solubilized and bound to the affinity matrix (11). Furthermore, if the fusion protein contains the recognition sequence for a specific protease, the affinity tag can be removed while the protein is immobilized on the affinity column.