a freelance writer in Boston, MA.
Full Text (PDF)
Detecting DNA sequence polymorphisms is illuminating, but it is not the whole picture. A heterozygous allele might imply a balanced mixture of two genotypes, but roughly half the time, one allele is expressed at least 2-fold more than the other. Even looking at the RNA may not reveal the full story, since differences in translation and protein turnover might further skew the allelic (im)balance. Furthermore, efforts to determine disease-causing mutations might benefit if the search were limited to factors that are expressed in the affected tissue. For these reasons, reliably detecting polymorphisms via proteomics data would be an important advance. In a recent study. Bunger et al., describe their efforts to detect non-synonymous coding SNPs from LC-MS/MS data. However, parsing shotgun proteomics data to detect amino acid changes presents many challenges. Protein identifications are made on the basis of matching the mass of a trypsin-generated peptide to a database entry. A shift in peptide mass might indicate a residue change resulting from a SNP, or it could be the wild-type sequence with either a bona fide posttranslational modification or a chemical modification incurred during sample processing. Alternatively, a peptide containing an amino acid difference might be misidentified as belonging to another protein. Faced with these confounding issues. Bunger et al., give ample attention to strategies to reduce false-positives, including filtering potential hits to those consistent with the peptide's pI and DNA sequencing to confirm novel SNPs. All told, the authors detected 629 non-synonymous SNPs from a whole-cell extract of a human breast cancer line. Although the overall process is currently considerably more complicated than DNA-based SNP detection, the authors remind readers that MS-based SNPing can be performed by reanalyzing previously collected data. As filters and database searches improve, stored data sets may yet reveal a wealth of information about the diversity to be uncovered in population-based proteomics.
- Bunger et al. 2007. Detection and validation of non-synonymous coding SNPs from orthogonal analysis of shotgun proteomics data. Journal of Proteome Research [Epub ahead of print, May 9, 2007].
A Red-Shifted StarAs flexible as GFP has proven to be, tinkering with this protein has never succeeded in producing a red-shifted variant that is bright, and that does not depend upon multimerization in order to fluoresce. To fill this need, Calloway et al., turned to synthetic chemistry instead of mutagenesis. Members of the team had already developed a protein-labeling system based on Escherichia coli dihydrofolate reductase (eDHFR) and fluorescent derivatives of the eDHFR ligand trimethoprim (TMP). Unfortunately, this first generation of fluorescent TMP conjugates gave disappointing results when used with cytoplasmic proteins, because a nonspecific specking pattern would obscure the signal. Calloway et al., decided to revisit the TMP-eDHFR system and see if re-engineered TMP-fluorophore conjugates could overcome the signal-to-noise limitations. The authors synthesized two new compounds, which differ from the original TMP conjugates by the dye used (fluoroscein or hexachloro fluoroscein), and the use of a more hydrophilic linker to join the dye to TMP. These new conjugates were synthesized with hydrophobic protecting groups to ease cell permeability (which are subsequently hydrolyzed in the cytosol). Both new TMP-fluorophore conjugates outperformed the first-generation compounds by achieving specific, intense staining of eDHFR that had been fused to either α-tubulin or myosin light-chain kinase. And, since the hexachlorofluorescein TMP conjugate is red-shifted compared to GFP, the new labeling system can be used in tandem with existing GFP fusion proteins. The authors conclude by describing other possible extensions of the TMP-eDHFR system, but this versatile red-shifted TMP conjugate is itself likely to be a standout in a galaxy of fluorescent labeling options.

Image reprinted with permission. © 2007 John Wiley & Sons, Inc.
- Calloway et al. 2007. Optimized fluorescent trimethoprim derivatives for in vivo protein labeling. Chembiochem 8:767-774.
Acquired ImmunityAlthough bacteria are more familiar as a source of antigens than antibodies, a group of researchers from the University of Texas at Austin has managed to harness Escherichia coli for antibody production. It is true that IgG heavy and light chains have been successfully expressed in E. coli before, but an integrated strategy for selection and isolation of full-length IgG antibodies has not been previously demonstrated. The new work, from Georgiou and colleagues, describes the so-called E-clonal approach. The process begins with the isolation of RNA from the splenocytes of mice immunized with the antigen of interest. The resulting cDNA is used to create a plasmid library for expression of the VH and VL genes. This library is transformed into E. coli that express a fusion protein consisting of one domain designed to tether the chimeric protein to the periplasmic face of the bacterial inner membrane, and another domain that binds the Fc portion of IgG. This fusion protein acts as an anchor to hold any IgG produced by the cell within the periplasmic space. This is crucial because the next step calls for the outer membrane of the bacteria to be permeabilized. As a result, the antibodies are displayed on the outer surface of the resulting spheroblast. These spheroplasts are then subjected to flow cytometry-mediated selection of those that bind to a fluorescently labeled antigen. The authors put the strategy through its paces by producing antibodies directed against the Protective Antigen of Bacillus anthracis. The resulting IgGs showed affinities within the range expected for antibodies produced by more typical methods. Although the resulting IgG is not glycosylated, it has previously been shown that aglycosylated antibody can be used directly in animal experiments. It is true that the E. coli-generated IgG molecules may fail to rouse antibody-dependent cell-mediated toxicity (as would be required for therapeutic molecules), but the simplicity and elegance of this approach can nonetheless be expected to make E. coli the organism of choice for many antibody production applications.
- Mazor et al. 2007. Isolation of engineered, full-length antibodies from libraries expressed in Escherichia coli. Nature Biotechnology 25:563-565.
Double or NothingMedical sequencing of genomic DNA provides detail that whole-genome SNP scans cannot. For example, even though chips can be used to infer copy number variation, they are powerless to pick up indels, which have been linked to a number of diseases under active study. Although sequencing can detect polymorphisms of this sort, reads from genomic DNA are complicated by dueling signals of the diploid template. Sequencing analysis tools have done much to ease detection of SNPs from biallelic samples; however, parsing the overlapping reads resulting from a heterozygous indel is much more tricky. To meet this need, members of Washington University's Genome Sequencing Center have developed PolyScan, a new algorithm and software package for automatic detection of indels in human resequencing data. PolyScan begins with the chromatogram and base calls generated by established programs such as phred. The program then reanalyzes the trace in order to detect additional peaks that might reveal a hidden, nonreference sequence, and uses alignment and noise reduction algorithms to identify the heterozygous indels. To prove the value of the system, the authors revisited sequence data on 13 genes that had been manually annotated in a previous study and for which 1546 heterozygous indels had been found. About 84% of the indels that had been detected by hand were picked up by PolyScan. This performance exceeds the capabilities of PolyPhred, which detected about 72% of the indels and showed a lower specificity. PolyScan also compares favorably to Mutation Surveyor and InSNP, two tools that are better suited to smallerscale sequencing projects. PolyScan does have some limitations, including false-positives caused by low-quality sequence regions. Nevertheless, PolyScan will be an extremely attractive tool, as large-scale medical sequencing projects become increasingly common and manually parsing biallelic sequence proves unworkable.

- Chen et al. 2007. PolyScan: An automatic indel and SNP detection approach to the analysis of human resequencing data. Genome Research 17:659-666.