A targeted approach to using AI in drug discovery


Original story from Vanderbilt University (TN, USA).

How are we improving the way the field of drug discovery creates machine learning algorithms to predict a protein’s interactions with a small molecule?

The drug development pipeline is a costly and lengthy process. Identifying high-quality ‘hit’ compounds – those with high potency, selectivity and favorable metabolic properties – at the earliest stages is important for reducing cost and accelerating the path to clinical trials. For the last decade, scientists have looked to machine learning to make this initial screening process more efficient.

Computer-aided drug design is used to computationally screen for compounds that interact with a target protein. However, the ability to accurately and rapidly estimate the strength of these interactions remains a challenge.

“Machine learning (ML) promised to bridge the gap between the accuracy of gold-standard, physics-based computational methods and the speed of simpler empirical scoring functions,” commented Benjamin P. Brown, an assistant professor of pharmacology at the Vanderbilt University School of Medicine Basic Sciences (TN, USA). “Unfortunately, its potential has so far been unrealized because current ML methods can unpredictably fail when they encounter chemical structures that they were not exposed to during their training, which limits their usefulness for real-world drug discovery.”

In his recent paper, he proposes a targeted approach: instead of learning from the entire 3D structure of a protein and a drug molecule, Brown proposes a task-specific model architecture that is intentionally restricted to learn only from a representation of their interaction space, which captures the distance-dependent physicochemical interactions between atom pairs.

“By constraining the model to this view, it is forced to learn the transferable principles of molecular binding rather than structural shortcuts present in the training data that fail to generalize to new molecules,” Brown explained.


Credit: Justin Hill, Philip Rosenberg, and Ronit Freeman.

Flower power: the development of dynamic microscopic robots

Dynamic microscopic robots have been developed to respond to their surroundings, potentially offering a vessel for delivering medicine or cleaning up pollution.

 


A key aspect of Brown’s work is the rigorous evaluation protocol he developed. “We set up our training and testing runs to simulate a real-world scenario: ‘If a novel protein family were discovered tomorrow, would our model be able to make effective predictions for it?’” he shared. To do this, he left out entire protein superfamilies and all their associated chemical data from the training set, creating a challenging and realistic test of the model’s ability to generalize.

Brown’s work provides several key insights for the field:

  1. Task-specific specialized architectures provide a clear avenue for building generalizable models using today’s publicly available datasets. By designing a model with a specific ‘inductive bias’ that forces it to learn from a representation of molecular interactions rather than from raw chemical structures, it generalizes more effectively.
  2. Rigorous, realistic benchmarks are critical. The paper’s validation protocol revealed that contemporary ML models performing well on standard benchmarks can show a significant drop in performance when faced with novel protein families. This highlights the need for more stringent evaluation practices in the field to accurately gauge real-world utility.
  3. Current performance gains over conventional scoring functions are modest, but the work establishes a clear, reliable baseline for a modeling strategy that doesn’t fail unpredictably, which is a critical step toward building trustworthy AI for drug discovery.

Brown, a core faculty member of the Center for AI in Protein Dynamics (TN, USA), knows that there is more work to be done. His current project focused exclusively on scoring – ranking compounds based on the strength of their interaction with the target protein – which is only part of the structure-based drug discovery equation. “My lab is fundamentally interested in modeling challenges related to scalability and generalizability in molecular simulation and computer-aided drug design. Hopefully soon we can share some additional work that aims to advance these principles,” Brown concluded.

For now, significant challenges remain, but Brown’s work on building a more dependable approach for machine learning in structure-based computer-aided drug design has clarified the path forward.

This article has been republished from the following materials. Material may have been edited for length and house style. For further information, please contact the cited source. Our press release publishing policy can be accessed here.

 


Submit Your Research to the F1000Research Bioinformatics Gateway

Advance the field of bioinformatics by sharing your research through the F1000Research Bioinformatics Gateway. With trusted publishing, open access, and transparent peer review, your work will meet the highest standards of rigor and integrity while driving innovation in computational biology and data analysis.

Join a platform that prioritizes transparency, openness, and author control. Submit your research today at F1000Research Bioinformatics Gateway.