Novel tool for detecting hidden genetic mutations

A graph-based computational tool for detecting previously invisible genetic mutations has been developed.
Researchers at the University of California, Los Angeles (UCLA; USA) and the University of Toronto (Canada) have collaborated to develop moPepGen, an advanced computational tool for identifying previously hidden genetic mutations in proteins, unlocking new possibilities for cancer research, neurodegenerative disease research and more.
Proteogenomics – the study of genomics and proteomics – provides a comprehensive molecular profile of diseases. However, limitations with proteogenomics exist due to the challenge of modeling the complexities of gene expression. Additionally, existing proteomic tools can fail to capture the full diversity of protein variations.
Achieving a high level of precision is critical as proteins can signal progression in diseases, such as cancer. However, analyzing proteins to detect these changes remains a challenge for computational tools. Now, researchers have developed moPepGen to address this gap; it’s a graph-based algorithm that enables more precise identification of protein sequence variations and subsequently generates non-canonical peptides in linear time.
“We developed moPepGen to help researchers determine which genetic variants are truly expressed at the protein level, addressing a long-standing challenge in the proteogenomic community,” commented co-first author Chenghao Zhu (UCLA).
AI tool for forecasting infectious disease risk
The first AI tool to use large language modeling to predict infectious disease risk has been developed.
moPepGen’s graph-based approach significantly improves the detection of previously invisible protein variations and can process all types of genetic changes, providing a more comprehensive view of protein diversity and a more accurate picture of disease-associated mutations.
Compared to existing tools, which typically detect genetic changes as a single amino acid substitution, moPepGen captures peptides that harbor any combination of small variants caused by alternative splicing, circular RNAs, gene fusions and gene editing, as well as other complex genetic modifications. Additionally, moPepGen can systematically model how genes are expressed and translated into proteins, thereby expanding the ability to detect mutations linked to disease.
“Until now, there hasn’t been a practical way to handle the enormous complexity of genetic and transcriptomic variation,” shared Zhu. “The algorithm works rapidly, even when analyzing massive amounts of data, and is designed to function across multiple technologies and species.”
To test their model, the researchers used moPenGen to analyze proteogenomic data from five prostate tumors, eight kidney tumors and 375 cell lines. Their model was able to identify previously invisible protein variations associated with genetic mutations, gene fusions and other molecular changes. moPepGen was also able to detect four times more unique protein variants than existing tools, highlighting its more sensitive and comprehensive approach.
“By making it easier to analyze complex protein variations, moPepGen has the potential to advance research in cancer, neurodegenerative diseases, and other fields where understanding protein diversity is critical,” concluded Paul Boutros (UCLA). “It bridges the gap between genetic data and real-world protein expression, unlocking new possibilities in precision medicine and beyond.”
moPepGen is now freely available for researchers, and it can be integrated into existing proteomic analysis workflows, making it accessible to labs worldwide and capable of enhancing proteogenomic analyses for many applications.