FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking
Abstract Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-025-56745-6 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861847618486272 |
---|---|
author | Sophia Vincoff Shrey Goel Kseniia Kholina Rishab Pulugurta Pranay Vure Pranam Chatterjee |
author_facet | Sophia Vincoff Shrey Goel Kseniia Kholina Rishab Pulugurta Pranay Vure Pranam Chatterjee |
author_sort | Sophia Vincoff |
collection | DOAJ |
description | Abstract Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%–40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers. |
format | Article |
id | doaj-art-d8e2affe122f45738ced66d0738d10b8 |
institution | Kabale University |
issn | 2041-1723 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj-art-d8e2affe122f45738ced66d0738d10b82025-02-09T12:45:18ZengNature PortfolioNature Communications2041-17232025-02-0116111110.1038/s41467-025-56745-6FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate maskingSophia Vincoff0Shrey Goel1Kseniia Kholina2Rishab Pulugurta3Pranay Vure4Pranam Chatterjee5Department of Biomedical Engineering, Duke UniversityDepartment of Computer Science, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityAbstract Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%–40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.https://doi.org/10.1038/s41467-025-56745-6 |
spellingShingle | Sophia Vincoff Shrey Goel Kseniia Kholina Rishab Pulugurta Pranay Vure Pranam Chatterjee FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking Nature Communications |
title | FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking |
title_full | FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking |
title_fullStr | FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking |
title_full_unstemmed | FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking |
title_short | FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking |
title_sort | fuson plm a fusion oncoprotein specific language model via adjusted rate masking |
url | https://doi.org/10.1038/s41467-025-56745-6 |
work_keys_str_mv | AT sophiavincoff fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking AT shreygoel fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking AT kseniiakholina fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking AT rishabpulugurta fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking AT pranayvure fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking AT pranamchatterjee fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking |