Latent space improved masked reconstruction model for human skeleton-based action recognition

Human skeleton-based action recognition is an important task in the field of computer vision. In recent years, masked autoencoder (MAE) has been used in various fields due to its powerful self-supervised learning ability and has achieved good results in masked data reconstruction tasks. However, in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Enqing Chen, Xueting Wang, Xin Guo, Ying Zhu, Dong Li
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-02-01
Series:	Frontiers in Neurorobotics
Subjects:	human skeleton-based action recognition variational autoencoder vector quantized variational autoencoder masked reconstruction model self-supervised learning
Online Access:	https://www.frontiersin.org/articles/10.3389/fnbot.2025.1482281/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823856499659636736
author	Enqing Chen Xueting Wang Xin Guo Ying Zhu Dong Li
author_facet	Enqing Chen Xueting Wang Xin Guo Ying Zhu Dong Li
author_sort	Enqing Chen
collection	DOAJ
description	Human skeleton-based action recognition is an important task in the field of computer vision. In recent years, masked autoencoder (MAE) has been used in various fields due to its powerful self-supervised learning ability and has achieved good results in masked data reconstruction tasks. However, in visual classification tasks such as action recognition, the limited ability of the encoder to learn features in the autoencoder structure results in poor classification performance. We propose to enhance the encoder's feature extraction ability in classification tasks by leveraging the latent space of variational autoencoder (VAE) and further replace it with the latent space of vector quantized variational autoencoder (VQVAE). The constructed models are called SkeletonMVAE and SkeletonMVQVAE, respectively. In SkeletonMVAE, we constrain the latent variables to represent features in the form of distributions. In SkeletonMVQVAE, we discretize the latent variables. These help the encoder learn deeper data structures and more discriminative and generalized feature representations. The experiment results on the NTU-60 and NTU-120 datasets demonstrate that our proposed method can effectively improve the classification accuracy of the encoder in classification tasks and its generalization ability in the case of few labeled data. SkeletonMVAE exhibits stronger classification ability, while SkeletonMVQVAE exhibits stronger generalization in situations with fewer labeled data.
format	Article
id	doaj-art-65f47d78ba614d8294971e0d8ef11b1f
institution	Kabale University
issn	1662-5218
language	English
publishDate	2025-02-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neurorobotics
spelling	doaj-art-65f47d78ba614d8294971e0d8ef11b1f2025-02-12T07:26:45ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182025-02-011910.3389/fnbot.2024.14822811482281Latent space improved masked reconstruction model for human skeleton-based action recognitionEnqing Chen0Xueting Wang1Xin Guo2Ying Zhu3Dong Li4School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, ChinaSchool of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, ChinaSchool of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, ChinaState Grid Henan Electric Power Company Information and Communication Branch, Zhengzhou, ChinaState Grid Henan Electric Power Company Information and Communication Branch, Zhengzhou, ChinaHuman skeleton-based action recognition is an important task in the field of computer vision. In recent years, masked autoencoder (MAE) has been used in various fields due to its powerful self-supervised learning ability and has achieved good results in masked data reconstruction tasks. However, in visual classification tasks such as action recognition, the limited ability of the encoder to learn features in the autoencoder structure results in poor classification performance. We propose to enhance the encoder's feature extraction ability in classification tasks by leveraging the latent space of variational autoencoder (VAE) and further replace it with the latent space of vector quantized variational autoencoder (VQVAE). The constructed models are called SkeletonMVAE and SkeletonMVQVAE, respectively. In SkeletonMVAE, we constrain the latent variables to represent features in the form of distributions. In SkeletonMVQVAE, we discretize the latent variables. These help the encoder learn deeper data structures and more discriminative and generalized feature representations. The experiment results on the NTU-60 and NTU-120 datasets demonstrate that our proposed method can effectively improve the classification accuracy of the encoder in classification tasks and its generalization ability in the case of few labeled data. SkeletonMVAE exhibits stronger classification ability, while SkeletonMVQVAE exhibits stronger generalization in situations with fewer labeled data.https://www.frontiersin.org/articles/10.3389/fnbot.2025.1482281/fullhuman skeleton-based action recognitionvariational autoencodervector quantized variational autoencodermasked reconstruction modelself-supervised learning
spellingShingle	Enqing Chen Xueting Wang Xin Guo Ying Zhu Dong Li Latent space improved masked reconstruction model for human skeleton-based action recognition Frontiers in Neurorobotics human skeleton-based action recognition variational autoencoder vector quantized variational autoencoder masked reconstruction model self-supervised learning
title	Latent space improved masked reconstruction model for human skeleton-based action recognition
title_full	Latent space improved masked reconstruction model for human skeleton-based action recognition
title_fullStr	Latent space improved masked reconstruction model for human skeleton-based action recognition
title_full_unstemmed	Latent space improved masked reconstruction model for human skeleton-based action recognition
title_short	Latent space improved masked reconstruction model for human skeleton-based action recognition
title_sort	latent space improved masked reconstruction model for human skeleton based action recognition
topic	human skeleton-based action recognition variational autoencoder vector quantized variational autoencoder masked reconstruction model self-supervised learning
url	https://www.frontiersin.org/articles/10.3389/fnbot.2025.1482281/full
work_keys_str_mv	AT enqingchen latentspaceimprovedmaskedreconstructionmodelforhumanskeletonbasedactionrecognition AT xuetingwang latentspaceimprovedmaskedreconstructionmodelforhumanskeletonbasedactionrecognition AT xinguo latentspaceimprovedmaskedreconstructionmodelforhumanskeletonbasedactionrecognition AT yingzhu latentspaceimprovedmaskedreconstructionmodelforhumanskeletonbasedactionrecognition AT dongli latentspaceimprovedmaskedreconstructionmodelforhumanskeletonbasedactionrecognition

Latent space improved masked reconstruction model for human skeleton-based action recognition

Similar Items