Direct coupling analysis and the attention mechanism
Abstract Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-02-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-025-06062-y |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861562793787392 |
---|---|
author | Francesco Caredda Andrea Pagnani |
author_facet | Francesco Caredda Andrea Pagnani |
author_sort | Francesco Caredda |
collection | DOAJ |
description | Abstract Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid sequence has been and still is a phenomenal computational challenge that the introduction of AlphaFold solved with unprecedented accuracy. However, the inherent complexity of AlphaFold’s architectures makes it challenging to understand the rules that ultimately shape the protein’s predicted structure. This study investigates a single-layer unsupervised model based on the attention mechanism. More precisely, we explore a Direct Coupling Analysis (DCA) method that mimics the attention mechanism of several popular Transformer architectures, such as AlphaFold itself. The model’s parameters, notably fewer than those in standard DCA-based algorithms, can be directly used for extracting structural determinants such as the contact map of the protein family under study. Additionally, the functional form of the energy function of the model enables us to deploy a multi-family learning strategy, allowing us to effectively integrate information across multiple protein families, whereas standard DCA algorithms are typically limited to single protein families. Finally, we implemented a generative version of the model using an autoregressive architecture, capable of efficiently generating new proteins in silico. |
format | Article |
id | doaj-art-61a6ed9a897b4305b185a8e6227a7670 |
institution | Kabale University |
issn | 1471-2105 |
language | English |
publishDate | 2025-02-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj-art-61a6ed9a897b4305b185a8e6227a76702025-02-09T12:56:56ZengBMCBMC Bioinformatics1471-21052025-02-0126112110.1186/s12859-025-06062-yDirect coupling analysis and the attention mechanismFrancesco Caredda0Andrea Pagnani1DISAT, Politecnico di TorinoDISAT, Politecnico di TorinoAbstract Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid sequence has been and still is a phenomenal computational challenge that the introduction of AlphaFold solved with unprecedented accuracy. However, the inherent complexity of AlphaFold’s architectures makes it challenging to understand the rules that ultimately shape the protein’s predicted structure. This study investigates a single-layer unsupervised model based on the attention mechanism. More precisely, we explore a Direct Coupling Analysis (DCA) method that mimics the attention mechanism of several popular Transformer architectures, such as AlphaFold itself. The model’s parameters, notably fewer than those in standard DCA-based algorithms, can be directly used for extracting structural determinants such as the contact map of the protein family under study. Additionally, the functional form of the energy function of the model enables us to deploy a multi-family learning strategy, allowing us to effectively integrate information across multiple protein families, whereas standard DCA algorithms are typically limited to single protein families. Finally, we implemented a generative version of the model using an autoregressive architecture, capable of efficiently generating new proteins in silico.https://doi.org/10.1186/s12859-025-06062-yProtein structure predictionAttention mechanismDirect coupling analysisTransformer |
spellingShingle | Francesco Caredda Andrea Pagnani Direct coupling analysis and the attention mechanism BMC Bioinformatics Protein structure prediction Attention mechanism Direct coupling analysis Transformer |
title | Direct coupling analysis and the attention mechanism |
title_full | Direct coupling analysis and the attention mechanism |
title_fullStr | Direct coupling analysis and the attention mechanism |
title_full_unstemmed | Direct coupling analysis and the attention mechanism |
title_short | Direct coupling analysis and the attention mechanism |
title_sort | direct coupling analysis and the attention mechanism |
topic | Protein structure prediction Attention mechanism Direct coupling analysis Transformer |
url | https://doi.org/10.1186/s12859-025-06062-y |
work_keys_str_mv | AT francescocaredda directcouplinganalysisandtheattentionmechanism AT andreapagnani directcouplinganalysisandtheattentionmechanism |