PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology
Abstract Protein interactions are crucial for understanding biological functions and disease mechanisms, but predicting these remains a complex task in computational biology. Increasingly, Deep Learning models are having success in interface prediction. This study presents PIPENN-EMB which explores...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-88445-y |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823862483437223936 |
---|---|
author | David P. G. Thomas Carlos M. Garcia Fernandez Reza Haydarlou K. Anton Feenstra |
author_facet | David P. G. Thomas Carlos M. Garcia Fernandez Reza Haydarlou K. Anton Feenstra |
author_sort | David P. G. Thomas |
collection | DOAJ |
description | Abstract Protein interactions are crucial for understanding biological functions and disease mechanisms, but predicting these remains a complex task in computational biology. Increasingly, Deep Learning models are having success in interface prediction. This study presents PIPENN-EMB which explores the added value of using embeddings from the ProtT5-XL protein language model. Our results show substantial improvement over the previously published PIPENN model for protein interaction interface prediction, reaching an MCC of 0.313 vs. 0.249, and AUROC 0.800 vs. 0.755 on the BIO_DL_TE test set. We furthermore show that these embeddings cover a broad range of ‘hand-crafted’ protein features in ablation studies. PIPENN-EMB reaches state-of-the-art performance on the ZK448 dataset for protein-protein interface prediction. We showcase predictions on 25 resistance-related proteins from Mycobacterium tuberculosis. Furthermore, whereas other state-of-the-art sequence-based methods perform worse for proteins that have little recognisable homology in their training data, PIPENN-EMB generalises to remote homologs, yielding stable AUROC across all three test sets with less than 30% sequence identity to the training dataset, and even to proteins with less than 15% sequence identity. |
format | Article |
id | doaj-art-69b97be2db56444ea6103295df9bb699 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-69b97be2db56444ea6103295df9bb6992025-02-09T12:30:22ZengNature PortfolioScientific Reports2045-23222025-02-0115111010.1038/s41598-025-88445-yPIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homologyDavid P. G. Thomas0Carlos M. Garcia Fernandez1Reza Haydarlou2K. Anton Feenstra3Department of Computer Science, Vrije Universiteit AmsterdamDepartment of Computer Science, Vrije Universiteit AmsterdamDepartment of Computer Science, Vrije Universiteit AmsterdamDepartment of Computer Science, Vrije Universiteit AmsterdamAbstract Protein interactions are crucial for understanding biological functions and disease mechanisms, but predicting these remains a complex task in computational biology. Increasingly, Deep Learning models are having success in interface prediction. This study presents PIPENN-EMB which explores the added value of using embeddings from the ProtT5-XL protein language model. Our results show substantial improvement over the previously published PIPENN model for protein interaction interface prediction, reaching an MCC of 0.313 vs. 0.249, and AUROC 0.800 vs. 0.755 on the BIO_DL_TE test set. We furthermore show that these embeddings cover a broad range of ‘hand-crafted’ protein features in ablation studies. PIPENN-EMB reaches state-of-the-art performance on the ZK448 dataset for protein-protein interface prediction. We showcase predictions on 25 resistance-related proteins from Mycobacterium tuberculosis. Furthermore, whereas other state-of-the-art sequence-based methods perform worse for proteins that have little recognisable homology in their training data, PIPENN-EMB generalises to remote homologs, yielding stable AUROC across all three test sets with less than 30% sequence identity to the training dataset, and even to proteins with less than 15% sequence identity.https://doi.org/10.1038/s41598-025-88445-yProtein interface predictionProtein–protein interactionsSequence-based predictionPPIEmbedding |
spellingShingle | David P. G. Thomas Carlos M. Garcia Fernandez Reza Haydarlou K. Anton Feenstra PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology Scientific Reports Protein interface prediction Protein–protein interactions Sequence-based prediction PPI Embedding |
title | PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology |
title_full | PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology |
title_fullStr | PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology |
title_full_unstemmed | PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology |
title_short | PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology |
title_sort | pipenn emb ensemble net and protein embeddings generalise protein interface prediction beyond homology |
topic | Protein interface prediction Protein–protein interactions Sequence-based prediction PPI Embedding |
url | https://doi.org/10.1038/s41598-025-88445-y |
work_keys_str_mv | AT davidpgthomas pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology AT carlosmgarciafernandez pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology AT rezahaydarlou pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology AT kantonfeenstra pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology |