Full Text (PDF)
There is an often unspoken truth behind the course of scientific investigation that involves not what is necessarily academically worthy of study, but rather what is scientifically worthy in the eyes of funding agencies. The perception of worthy research is, as cost is driven in the simplest sense in economics, often driven by demand. Presently, the demand for novel diagnostic and therapeutic protein biomarkers that possess high sensitivity and specificity is placing major impact on the field of proteomics. The focal discovery technology that is being relied on is mass spectrometry (MS), whereas the challenge of biomarker discovery often lies not in the application of MS but in the underlying proteome sampling and bioinformatic processing strategies. Although biomarker discovery research has been historically technology-driven, it is clear from the meager success in generating validated biomarkers that increasing attention must be placed at the pre-analytic stage, such as sample retrieval and preparation. As diseases vary, so do the combinations of sampling and sample analyses necessary to discover novel biomarkers. In this review, we highlight different strategies used toward biomarker discovery and discuss them in terms of their reliance on technology and methodology.
Protein biomarker discovery requires multidisciplinary strategies that incorporate intelligent sample collection, processing, data acquisition, and analysis (see Reference 2). Protein identification has historically been accomplished by Edman degradation. Although an arguably powerful technology, it suffers both in terms of throughput and scope, as proteins can only be sequenced consecutively and only after appreciable purification. Furthermore, even if these criteria are fulfilled, this technology falls short, in that it provides no facile translation to routine protein assay development. Improvements in biomarker assays have all relied on the development and use of high affinity reagents (i.e., antibodies). The development of modern separations coupled with advanced mass spectrometry (MS) has ushered in a new paradigm in the throughput and scope with which proteins can be identified and characterized. These proteomic-scale capabilities now enable thousands of proteins to be identified from complex mixtures (1). Indeed, no other technology parallels the capability of MS in the identification of proteins from complex proteomic samples, such as serum, plasma, urine, or cerebrospinal fluid (CSF). It is for these reasons that conventional wisdom suggests that through application of these new tools, novel and specific disease biomarkers will be identified. Accordingly, the recent past has seen an exponential rise in data acquisition (i.e., MS) and data analysis (i.e., computer hardware and software) capabilities, while it may be argued that commensurate advances in sample collection and processing have lagged. The increasing power of MS and bioinformatic tools often results in experimental designs that are overly dependent on technology and suffer from lack of imaginative sample preparation. As shown in Figure 1, there is often an inverse relationship between the complexity of sample preparation and the amount of data acquired or the sophistication of the bioinformatic analysis. Simply put, minimal sample preparation prior to MS analysis will require more data acquisition and more sophisticated bioinformatic analysis. There is, however, a direct correlation between the amount of data acquired and the sophistication of the bioinformatic analysis.
Analytical Sampling of Biofluids
In the simplest sense, the goal of protein biomarker discovery is to identify a protein or panel of proteins that distinguish patients afflicted with a particular disease from healthy individuals (2). While the premise seems simple enough, achieving this goal has not been a trivial pursuit. Ideally, such a biomarker or biomarkers would be assayable in biological samples obtained through minimal invasion. Biofluids such as urine, serum, and plasma readily fulfill such criteria and are routinely collected during physical examinations. Unfortunately, these biofluids represent an extremely difficult matrix to characterize, even by the most advanced MS technologies. These difficulties are clear if one considers the physiological and analytical challenges in discovering, for example, a tumor-specific protein biomarker in serum. Assume that a population of tumor cells secretes an aberrant protein into the circulatory system. The blood, collected from a vein at the inner elbow and from which the serum sample is prepared, is derived from a 7.5-L circulatory system that encompasses approximately 100,000 km of veins, arteries, and capillaries. While the local concentration of the biomarker may be high in the microenvironment of the tumor, its travels take it through thousands of kilometers of biological highways until it reaches the point of extraction (i.e., inner elbow). This journey will have many confounding effects that present several analytical challenges to its facile detection, the most obvious of which is dilution. The high concentration of the biomarker within the vicinity of the tumor will be dramatically decreased as it moves within the circulatory system. Since the activity level of proteases in blood is high, the biomarker may also be digested into a variety of different fragments prior to collection. Therefore the primary sequence, and potentially the functional significance, of the biomarker can radically change between the point of entry into the circulation system and collection.
