Lassoing new therapeutics with computational biology

Original story from the University of Illinois Urbana-Champaign (IL, USA).
A new large language model predicts the properties of lasso peptides, natural products of bacteria that have therapeutic potential.
In the hunt for new therapeutics for cancer and infectious diseases, lasso peptides prove to be a catch. Their knot-like structures afford these molecules high stability and diverse biological activities, making them a promising avenue for new therapeutics. To better unleash their clinical potential, a collaborative team led by members of the Carl R. Woese Institute for Genomic Biology (IL, USA) developed LassoESM, a new large language model for predicting lasso peptide properties.
Lasso peptides are natural products made by bacteria. To produce these peptides, bacteria use ribosomes to build chains of amino acids that are then folded by biosynthetic enzymes into a unique slip knot-like structure. Through this process, thousands of different lasso peptides are generated, many of which have demonstrated antibacterial, antiviral and anticancer properties.
“There are striking opportunities to use lasso peptides in drug discovery, from targeting receptors to developing stable oral therapeutics,” shared Doug Mitchell, the Director of the Vanderbilt Institute for Chemical Biology (TN, USA) and co-leader of the study. “By building a dedicated language model for these molecules, we’ve created a tool that helps us unlock these possibilities far more efficiently.”
A targeted approach to using AI in drug discovery
How are we improving the way the field of drug discovery creates machine learning algorithms to predict a protein’s interactions with a small molecule?
Machine learning models have become essential tools for researchers, particularly for recognizing patterns in large datasets. This enables scientists to find new connections, while also saving months of time and effort. Protein prediction especially benefits from this technology, helping to uncover new insights into complex protein interactions and accelerate the discovery of new therapeutics. But commonly used AI platforms for protein prediction, such as AlphaFold, fall short when tasked with lasso peptides.
“Because of the unique structure of the lasso peptide, none of the current AI programs actually work in terms of doing a structure prediction,” explained project co-leader Diwakar Shukla, a professor of chemical and biomolecular engineering and James W. Westwater Professorial Scholar at the University of Illinois Urbana-Champaign (IL, USA).
Similar to the large language models powering AI chatbots, protein language models are trained to learn and apply the language of proteins: their amino acid sequences, three-dimensional structures and interactions with surrounding environments. But without lasso peptide specific training data, these algorithms lack specificity for these molecules.
“Predicting lasso peptide properties has been challenging due to the scarcity of experimentally labeled data and the complexity of enzyme–peptide substrate interactions,” commented Xuenan Mi, who recently earned her PhD in Shukla’s research group. “We developed LassoESM, a lasso peptide-tailored protein language model, to capture peptide-specific features that are often missed by generic protein language models.”
Mitchell’s group first used bioinformatics methods to find thousands of lasso peptide sequences that different microorganisms produce. To improve the quality of the data, the team also manually validated any new lasso peptide sequences they discovered.
“Then, we learned the language of those lasso peptides using masked language modeling, which is where you hide part of the peptide, and then you try to predict the other half,” Shukla explained. “Once you have learned the language of how the lasso structure is formed in nature, then you can train efficient property prediction models based on these language model parameters.”
By combining the Shukla group’s machine learning knowledge with experimental data collected by Mitchell’s group, the team applied LassoESM for numerous useful prediction tasks. One area of focus is the identification of compatible lasso peptide and lasso cyclase pairs to expand the clinical potential of these molecules. Lasso cyclases are the enzymes responsible for the knot-forming step of lasso peptide biosynthesis. Like different locks require unique keys, different peptides require specific lasso cyclases to tie the characteristic knot.
“We built the models to predict which lasso cyclase could actually form a lasso peptide using only the sequence of amino acids in a peptide. If we can understand the substrate scope or we can engineer lasso cyclases, then we can potentially make any peptide into a lasso,” Shukla shared. Without LassoESM, these enzyme–substrate interactions are difficult to predict, highlighting the utility of this artificial intelligence tool.
Mi concluded: “We demonstrated that LassoESM enables accurate prediction of various lasso peptide properties, even with limited training data. This work provides a powerful AI-driven tool to accelerate the rational design of functional lasso peptides for biomedical and industrial applications.”
Moving forward, the team aims to expand their model to accommodate new prediction capabilities, such as building tailor-made language models for other peptide natural products and engineering lasso peptides to target specific proteins.
This article has been republished from the following materials. Material may have been edited for length and house style. For further information, please contact the cited source. Our press release publishing policy can be accessed here.
Submit Your Research to the F1000Research Bioinformatics Gateway
Advance the field of bioinformatics by sharing your research through the F1000Research Bioinformatics Gateway. With trusted publishing, open access, and transparent peer review, your work will meet the highest standards of rigor and integrity while driving innovation in computational biology and data analysis.
Join a platform that prioritizes transparency, openness, and author control. Submit your research today at F1000Research Bioinformatics Gateway.