FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking

Abstract Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based...

Full description

Saved in:
Bibliographic Details
Main Authors: Sophia Vincoff, Shrey Goel, Kseniia Kholina, Rishab Pulugurta, Pranay Vure, Pranam Chatterjee
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-56745-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861847618486272
author Sophia Vincoff
Shrey Goel
Kseniia Kholina
Rishab Pulugurta
Pranay Vure
Pranam Chatterjee
author_facet Sophia Vincoff
Shrey Goel
Kseniia Kholina
Rishab Pulugurta
Pranay Vure
Pranam Chatterjee
author_sort Sophia Vincoff
collection DOAJ
description Abstract Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%–40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.
format Article
id doaj-art-d8e2affe122f45738ced66d0738d10b8
institution Kabale University
issn 2041-1723
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-d8e2affe122f45738ced66d0738d10b82025-02-09T12:45:18ZengNature PortfolioNature Communications2041-17232025-02-0116111110.1038/s41467-025-56745-6FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate maskingSophia Vincoff0Shrey Goel1Kseniia Kholina2Rishab Pulugurta3Pranay Vure4Pranam Chatterjee5Department of Biomedical Engineering, Duke UniversityDepartment of Computer Science, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityDepartment of Biomedical Engineering, Duke UniversityAbstract Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%–40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.https://doi.org/10.1038/s41467-025-56745-6
spellingShingle Sophia Vincoff
Shrey Goel
Kseniia Kholina
Rishab Pulugurta
Pranay Vure
Pranam Chatterjee
FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking
Nature Communications
title FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking
title_full FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking
title_fullStr FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking
title_full_unstemmed FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking
title_short FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking
title_sort fuson plm a fusion oncoprotein specific language model via adjusted rate masking
url https://doi.org/10.1038/s41467-025-56745-6
work_keys_str_mv AT sophiavincoff fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking
AT shreygoel fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking
AT kseniiakholina fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking
AT rishabpulugurta fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking
AT pranayvure fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking
AT pranamchatterjee fusonplmafusiononcoproteinspecificlanguagemodelviaadjustedratemasking