PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology

Abstract Protein interactions are crucial for understanding biological functions and disease mechanisms, but predicting these remains a complex task in computational biology. Increasingly, Deep Learning models are having success in interface prediction. This study presents PIPENN-EMB which explores...

Full description

Saved in:
Bibliographic Details
Main Authors: David P. G. Thomas, Carlos M. Garcia Fernandez, Reza Haydarlou, K. Anton Feenstra
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88445-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862483437223936
author David P. G. Thomas
Carlos M. Garcia Fernandez
Reza Haydarlou
K. Anton Feenstra
author_facet David P. G. Thomas
Carlos M. Garcia Fernandez
Reza Haydarlou
K. Anton Feenstra
author_sort David P. G. Thomas
collection DOAJ
description Abstract Protein interactions are crucial for understanding biological functions and disease mechanisms, but predicting these remains a complex task in computational biology. Increasingly, Deep Learning models are having success in interface prediction. This study presents PIPENN-EMB which explores the added value of using embeddings from the ProtT5-XL protein language model. Our results show substantial improvement over the previously published PIPENN model for protein interaction interface prediction, reaching an MCC of 0.313 vs. 0.249, and AUROC 0.800 vs. 0.755 on the BIO_DL_TE test set. We furthermore show that these embeddings cover a broad range of ‘hand-crafted’ protein features in ablation studies. PIPENN-EMB reaches state-of-the-art performance on the ZK448 dataset for protein-protein interface prediction. We showcase predictions on 25 resistance-related proteins from Mycobacterium tuberculosis. Furthermore, whereas other state-of-the-art sequence-based methods perform worse for proteins that have little recognisable homology in their training data, PIPENN-EMB generalises to remote homologs, yielding stable AUROC across all three test sets with less than 30% sequence identity to the training dataset, and even to proteins with less than 15% sequence identity.
format Article
id doaj-art-69b97be2db56444ea6103295df9bb699
institution Kabale University
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-69b97be2db56444ea6103295df9bb6992025-02-09T12:30:22ZengNature PortfolioScientific Reports2045-23222025-02-0115111010.1038/s41598-025-88445-yPIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homologyDavid P. G. Thomas0Carlos M. Garcia Fernandez1Reza Haydarlou2K. Anton Feenstra3Department of Computer Science, Vrije Universiteit AmsterdamDepartment of Computer Science, Vrije Universiteit AmsterdamDepartment of Computer Science, Vrije Universiteit AmsterdamDepartment of Computer Science, Vrije Universiteit AmsterdamAbstract Protein interactions are crucial for understanding biological functions and disease mechanisms, but predicting these remains a complex task in computational biology. Increasingly, Deep Learning models are having success in interface prediction. This study presents PIPENN-EMB which explores the added value of using embeddings from the ProtT5-XL protein language model. Our results show substantial improvement over the previously published PIPENN model for protein interaction interface prediction, reaching an MCC of 0.313 vs. 0.249, and AUROC 0.800 vs. 0.755 on the BIO_DL_TE test set. We furthermore show that these embeddings cover a broad range of ‘hand-crafted’ protein features in ablation studies. PIPENN-EMB reaches state-of-the-art performance on the ZK448 dataset for protein-protein interface prediction. We showcase predictions on 25 resistance-related proteins from Mycobacterium tuberculosis. Furthermore, whereas other state-of-the-art sequence-based methods perform worse for proteins that have little recognisable homology in their training data, PIPENN-EMB generalises to remote homologs, yielding stable AUROC across all three test sets with less than 30% sequence identity to the training dataset, and even to proteins with less than 15% sequence identity.https://doi.org/10.1038/s41598-025-88445-yProtein interface predictionProtein–protein interactionsSequence-based predictionPPIEmbedding
spellingShingle David P. G. Thomas
Carlos M. Garcia Fernandez
Reza Haydarlou
K. Anton Feenstra
PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology
Scientific Reports
Protein interface prediction
Protein–protein interactions
Sequence-based prediction
PPI
Embedding
title PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology
title_full PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology
title_fullStr PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology
title_full_unstemmed PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology
title_short PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology
title_sort pipenn emb ensemble net and protein embeddings generalise protein interface prediction beyond homology
topic Protein interface prediction
Protein–protein interactions
Sequence-based prediction
PPI
Embedding
url https://doi.org/10.1038/s41598-025-88445-y
work_keys_str_mv AT davidpgthomas pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology
AT carlosmgarciafernandez pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology
AT rezahaydarlou pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology
AT kantonfeenstra pipennembensemblenetandproteinembeddingsgeneraliseproteininterfacepredictionbeyondhomology