Efficient Image Super-Resolution with Multi-Branch Mixer Transformer
Deep learning methods have demonstrated significant advancements in single image super-resolution (SISR), with Transformer-based models frequently outperforming CNN-based counterparts in performance. However, due to the self-attention mechanism in Transformers, achieving lightweight models remains...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Slovenian Society for Stereology and Quantitative Image Analysis
2025-02-01
|
Series: | Image Analysis and Stereology |
Subjects: | |
Online Access: | https://www.ias-iss.org/ojs/IAS/article/view/3399 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823858112770080768 |
---|---|
author | Long Zhang Yi Wan |
author_facet | Long Zhang Yi Wan |
author_sort | Long Zhang |
collection | DOAJ |
description |
Deep learning methods have demonstrated significant advancements in single image super-resolution (SISR), with Transformer-based models frequently outperforming CNN-based counterparts in performance. However, due to the self-attention mechanism in Transformers, achieving lightweight models remains challenging compared to CNN-based approaches. In this paper, we propose a lightweight Transformer model termed Multi-Branch Mixer Transformer (MBMT) for SR. The design of MBMT is motivated by two main considerations: while self-attention excels at capturing long-range dependencies in features, it struggles with extracting local features. Secondly, the quadratic complexity of self-attention forms a significant challenge in building lightweight models. To address these problems, we propose a Multi-Branch Token Mixer (MBTM) to extract richer global and local information. Specifically, MBTM consists of three parts: shifted window attention, depthwise convolution, and active token mixer. This multi-branch structure handles both long-range dependencies and local features simultaneously, enabling us to achieve excellent SR performance with just a few stacked modules. Experimental results demonstrate that MBTM achieves competitive performance while maintaining model efficiency compared to SOTA methods.
|
format | Article |
id | doaj-art-43a60febf5124b28909dce7c73d481cb |
institution | Kabale University |
issn | 1580-3139 1854-5165 |
language | English |
publishDate | 2025-02-01 |
publisher | Slovenian Society for Stereology and Quantitative Image Analysis |
record_format | Article |
series | Image Analysis and Stereology |
spelling | doaj-art-43a60febf5124b28909dce7c73d481cb2025-02-11T14:22:23ZengSlovenian Society for Stereology and Quantitative Image AnalysisImage Analysis and Stereology1580-31391854-51652025-02-0110.5566/ias.3399Efficient Image Super-Resolution with Multi-Branch Mixer TransformerLong Zhang0https://orcid.org/0009-0007-4245-1052Yi WanLanzhou University Deep learning methods have demonstrated significant advancements in single image super-resolution (SISR), with Transformer-based models frequently outperforming CNN-based counterparts in performance. However, due to the self-attention mechanism in Transformers, achieving lightweight models remains challenging compared to CNN-based approaches. In this paper, we propose a lightweight Transformer model termed Multi-Branch Mixer Transformer (MBMT) for SR. The design of MBMT is motivated by two main considerations: while self-attention excels at capturing long-range dependencies in features, it struggles with extracting local features. Secondly, the quadratic complexity of self-attention forms a significant challenge in building lightweight models. To address these problems, we propose a Multi-Branch Token Mixer (MBTM) to extract richer global and local information. Specifically, MBTM consists of three parts: shifted window attention, depthwise convolution, and active token mixer. This multi-branch structure handles both long-range dependencies and local features simultaneously, enabling us to achieve excellent SR performance with just a few stacked modules. Experimental results demonstrate that MBTM achieves competitive performance while maintaining model efficiency compared to SOTA methods. https://www.ias-iss.org/ojs/IAS/article/view/3399active token mixermulti-branch token mixersingle image super-resolutiontransformer |
spellingShingle | Long Zhang Yi Wan Efficient Image Super-Resolution with Multi-Branch Mixer Transformer Image Analysis and Stereology active token mixer multi-branch token mixer single image super-resolution transformer |
title | Efficient Image Super-Resolution with Multi-Branch Mixer Transformer |
title_full | Efficient Image Super-Resolution with Multi-Branch Mixer Transformer |
title_fullStr | Efficient Image Super-Resolution with Multi-Branch Mixer Transformer |
title_full_unstemmed | Efficient Image Super-Resolution with Multi-Branch Mixer Transformer |
title_short | Efficient Image Super-Resolution with Multi-Branch Mixer Transformer |
title_sort | efficient image super resolution with multi branch mixer transformer |
topic | active token mixer multi-branch token mixer single image super-resolution transformer |
url | https://www.ias-iss.org/ojs/IAS/article/view/3399 |
work_keys_str_mv | AT longzhang efficientimagesuperresolutionwithmultibranchmixertransformer AT yiwan efficientimagesuperresolutionwithmultibranchmixertransformer |