Direct coupling analysis and the attention mechanism

Abstract Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid...

Full description

Saved in:
Bibliographic Details
Main Authors: Francesco Caredda, Andrea Pagnani
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06062-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861562793787392
author Francesco Caredda
Andrea Pagnani
author_facet Francesco Caredda
Andrea Pagnani
author_sort Francesco Caredda
collection DOAJ
description Abstract Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid sequence has been and still is a phenomenal computational challenge that the introduction of AlphaFold solved with unprecedented accuracy. However, the inherent complexity of AlphaFold’s architectures makes it challenging to understand the rules that ultimately shape the protein’s predicted structure. This study investigates a single-layer unsupervised model based on the attention mechanism. More precisely, we explore a Direct Coupling Analysis (DCA) method that mimics the attention mechanism of several popular Transformer architectures, such as AlphaFold itself. The model’s parameters, notably fewer than those in standard DCA-based algorithms, can be directly used for extracting structural determinants such as the contact map of the protein family under study. Additionally, the functional form of the energy function of the model enables us to deploy a multi-family learning strategy, allowing us to effectively integrate information across multiple protein families, whereas standard DCA algorithms are typically limited to single protein families. Finally, we implemented a generative version of the model using an autoregressive architecture, capable of efficiently generating new proteins in silico.
format Article
id doaj-art-61a6ed9a897b4305b185a8e6227a7670
institution Kabale University
issn 1471-2105
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-61a6ed9a897b4305b185a8e6227a76702025-02-09T12:56:56ZengBMCBMC Bioinformatics1471-21052025-02-0126112110.1186/s12859-025-06062-yDirect coupling analysis and the attention mechanismFrancesco Caredda0Andrea Pagnani1DISAT, Politecnico di TorinoDISAT, Politecnico di TorinoAbstract Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid sequence has been and still is a phenomenal computational challenge that the introduction of AlphaFold solved with unprecedented accuracy. However, the inherent complexity of AlphaFold’s architectures makes it challenging to understand the rules that ultimately shape the protein’s predicted structure. This study investigates a single-layer unsupervised model based on the attention mechanism. More precisely, we explore a Direct Coupling Analysis (DCA) method that mimics the attention mechanism of several popular Transformer architectures, such as AlphaFold itself. The model’s parameters, notably fewer than those in standard DCA-based algorithms, can be directly used for extracting structural determinants such as the contact map of the protein family under study. Additionally, the functional form of the energy function of the model enables us to deploy a multi-family learning strategy, allowing us to effectively integrate information across multiple protein families, whereas standard DCA algorithms are typically limited to single protein families. Finally, we implemented a generative version of the model using an autoregressive architecture, capable of efficiently generating new proteins in silico.https://doi.org/10.1186/s12859-025-06062-yProtein structure predictionAttention mechanismDirect coupling analysisTransformer
spellingShingle Francesco Caredda
Andrea Pagnani
Direct coupling analysis and the attention mechanism
BMC Bioinformatics
Protein structure prediction
Attention mechanism
Direct coupling analysis
Transformer
title Direct coupling analysis and the attention mechanism
title_full Direct coupling analysis and the attention mechanism
title_fullStr Direct coupling analysis and the attention mechanism
title_full_unstemmed Direct coupling analysis and the attention mechanism
title_short Direct coupling analysis and the attention mechanism
title_sort direct coupling analysis and the attention mechanism
topic Protein structure prediction
Attention mechanism
Direct coupling analysis
Transformer
url https://doi.org/10.1186/s12859-025-06062-y
work_keys_str_mv AT francescocaredda directcouplinganalysisandtheattentionmechanism
AT andreapagnani directcouplinganalysisandtheattentionmechanism