GS-DTA: integrating graph and sequence models for predicting drug-target binding affinity
Abstract Background Drug-target binding affinity (DTA) prediction is vital in drug discovery and repositioning, more and more researchers are beginning to focus on this. Many effective methods have been proposed. However, some current methods have certain shortcomings in focusing on important nodes...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-02-01
|
Series: | BMC Genomics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12864-025-11234-4 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract Background Drug-target binding affinity (DTA) prediction is vital in drug discovery and repositioning, more and more researchers are beginning to focus on this. Many effective methods have been proposed. However, some current methods have certain shortcomings in focusing on important nodes in drug molecular graphs and dealing with complex structural molecules. In particular, when considering important nodes and complex substructures in molecules, they may not be able to fully explore the potential relationships between different parts. In addition, when dealing with protein structures, some methods ignore the connections between amino acid fragments that are far apart in sequence but may work synergistically in function. Results In this paper, we propose a new method, called GS-DTA, for predicting DTA based on graph and sequence models. GS-DTA takes simplified molecular input line input system (SMILES) of the drug and the protein amino acid sequence as input. First, each drug is modeled as a graph, in which a vertex is an atom and an edge represents interaction between atoms. Then GATv2-GCN and the three-layer GCN networks are used to extract the features of the drug. GATv2-GCN enhances the model’s ability to focus on important nodes by assigning dynamic attention scores, which improves the learning of the graph structure’s intricate patterns. Besides, The three-layer GCN can captures hierarchical features of the drug through deeper propagation and feature transformation. Meanwhile, for each protein, a framework combining CNN, Bi-LSTM, and Transformer is used to extract the contextual and structural information of the protein amino acid sequences, and this combination can help to understand a comprehensive and detailed features of the protein. Finally, the obtained drug and protein feature vectors are combined to predict DTA through the fully connected layer. The source code can be downloaded from https://github.com/zhuziguang/GS-DTA . Conclusions The results show that GS-DTA achieves good performance in terms of MSE, CI, and r2 m on the Davis and KIBA datasets, improving the accuracy of DTA prediction. |
---|---|
ISSN: | 1471-2164 |