Many obstacles lie in the path leading from basic research to clinical biomarker identification and validation. While gene expression microarrays, proteomics tools, imaging techniques, and next-generation sequencing can generate massive amounts of data, differences among tools, platforms, and samples (such as various chip manufacturers, differing patient and normal subject populations, and differences among software platforms used for data collection and analysis) have hampered clinical translation. But new initiatives aimed at standardization are beginning to emerge to meet this need.
The National Cancer Institute (NCI) established and sponsors the cancer Biomedical Informatics Grid (caBIG) as a way for members of the cancer community—including researchers, physicians, and patients—to share information. One of the major goals of caBIG is to develop standards and tools for data acquisition, analysis, and dissemination and it is committed to the open source environment when it comes to the development of tools, access to resources, and source code. Currently, the caBIG community includes more than 50 NCI-designated cancer centers, 30 other federal, academic, not-for-profit, and industry groups, numerous NCI-supported research groups, and over 900 individual researchers who have contributed to the development of tools and services already in existence, such as The Cancer Genome Atlas, clinical trial compatibility framework tools [e.g., Cancer Adverse Event Reporting System (caAERS)], life science distribution tools [e.g., Cancer Genome-Wide Association Studies (caGWAS)], and a data sharing and security framework tool. To further improve compatibility among the broad range of tools and software programs being used in cancer research, caBIG now offers a compatibility certification program.
CertificationMay Dongmei Wang and her group at the Wallace H. Coulter Department of Biomedical Engineering at the Georgia Institute of Technology recently developed two new software programs that remove noise and artifacts from gene expression microarray data to enable identification of potential cancer biomarkers. Wang submitted those programs, called caCORRECT (chip artifact CORRECTion) and omniBio-Marker, to caBIG for certification. “When we went through the review process, caBIG was supportive and impressed because we developed our software in-house, not through their funding,” she says.
According to Wang, NCI-funded researchers are now strongly recommended to make sure that their data and tools can interface with the grid to facilitate sharing. Any new software systems must also be compatible and able to communicate. There are two types of compatibility reviews at caBIG—one concerning the syntactic compatibility for the grid, which requires the use of Java programming, and the other is a semantic compatibility, which requires the use of UML (Unified Modeling Language). Wang was funded to develop her caCORRECT and omniBio-Marker programs by the National Institutes of Health, the Georgia Cancer Coalition, Microsoft Research, and Hewlett-Packard. For groups like hers to be to be considered for certification, they are required to undergo a thorough review by a caBIG-appointed committee consisting of two known and two anonymous mentors. The review process can often take a year or more.
In the end, Wang's programs were given silver-level certification compatibility, meaning that these tools were judged to be applicable to most currently funded caBIG developer projects. Other levels of caBIG certification include: (i) legacy, applicable to those systems designed without awareness of or prior to the availability of the compatibility guidelines, and which do not meet any requirements for interoperability; (ii) bronze, designed for certification of software products not created as part of the caBIG program but capable of being certified as compliant with caBIG guidelines; and (iii) gold, the highest level, currently being defined for a “formalized grid architecture and data standards that will enable standardized advertising, discovery, and use of all federated caBIG resources.” Wang says caBIG certification is important because it indicates that anyone in the cancer research community can use the tools, and the users are assured that the software tools are interoperable.

