A combination of statistical algorithms is improving searches for protein-protein interactions (PPIs) data through enhanced recognition of complicated semantic patterns in literature databases.
With the Multiple Kernel Learning framework, researchers from the University College London (UCL), the University of Glasgow, and Cornell University combined several algorithms that collectively recognized a broader spectrum of semantic variations that describe PPIs. When used with a search engine, the combined algorithms provide more results on PPIs by analyzing the frequency of co-occurrences between words.
“We would like to take a database filled with papers and automatically find evidence that the author is discussing about proteins that interact with each other,” said study author and UCL professor Mark Girolami.
One set of algorithms learned new semantic variations by training with un-annotated data from MEDLINE abstracts and open-access publications. To complement these algorithms, Girolami and colleagues integrated another algorithm that was trained with annotated data, using a semi-supervised learning approach. The combined algorithms produced statistically improved results, indicating a potential increase in the accuracy and sensitivity of searches for PPIs.
Search engines can recognize PPIs from the publications’ text when two protein names are linked by a verb or adjective in a sentence. But the clues are often unclear since the PPI information can be conveyed in a wide variety of ways. To find these semantic permutations, users must sift through the literature search results manually.
Previously, Girolami’s group deployed a single algorithm to search for PPIs. These algorithms produced a statistically ranked output but relied on a relatively small set of learned semantic patterns.
“To bench scientists, this work could provide them with a set of targeted papers that would be useful for whatever study they were engaged with or study outside of their research area,” said Girolami.
The paper, “Protein interaction sentence detection using multiple semantic kernels,” was published online on 14 May 2011 in Journal of Biomedical Semantics.