DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System

Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zong-En Wu, Shao-Jung Chan, Yeshanew Ale Wubet, Kuang-Yow Lian
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Dilated-LiGRU deep learning GRU keyword spotting X-vector
Online Access:	https://ieeexplore.ieee.org/document/10858118/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823859593647751168
author	Zong-En Wu Shao-Jung Chan Yeshanew Ale Wubet Kuang-Yow Lian
author_facet	Zong-En Wu Shao-Jung Chan Yeshanew Ale Wubet Kuang-Yow Lian
author_sort	Zong-En Wu
collection	DOAJ
description	Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated recurrent unit in X-vector (DLiGRU-X) to address these challenges specifically for small-footprint keyword spotting (KWS) tasks. The DLiGRU-X model enhances temporal feature extraction with a LiGRU and reduces computational complexity through dilated convolution techniques. The proposed model efficiently learns speech signal characteristics, making it suitable for scenarios with limited hardware resources and can handle an expanded vocabulary size of keyword identification. The proposed model is validated on the Google Speech Command public dataset, and its performance is compared with other recently proposed deep learning models for KWS. The proposed model achieves an excellent trade-off between recognition accuracy and computational complexity, outperforming various advanced keyword spotting models. Notably, despite a reduction in model parameters, DLiGRU-X maintains an accuracy of 97% without significant decline. This model offers greater flexibility compared to previous models, allowing users to adjust and expand the set of targeted vocabulary according to their needs and deploy the model in resource-constrained environments.
format	Article
id	doaj-art-19207b5a2de7499e92091379ed0edba2
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-19207b5a2de7499e92091379ed0edba22025-02-11T00:00:47ZengIEEEIEEE Access2169-35362025-01-0113234982350710.1109/ACCESS.2025.353647010858118DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting SystemZong-En Wu0Shao-Jung Chan1https://orcid.org/0009-0000-9203-9305Yeshanew Ale Wubet2https://orcid.org/0000-0002-1411-715XKuang-Yow Lian3https://orcid.org/0000-0002-5692-9279Department of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDeployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated recurrent unit in X-vector (DLiGRU-X) to address these challenges specifically for small-footprint keyword spotting (KWS) tasks. The DLiGRU-X model enhances temporal feature extraction with a LiGRU and reduces computational complexity through dilated convolution techniques. The proposed model efficiently learns speech signal characteristics, making it suitable for scenarios with limited hardware resources and can handle an expanded vocabulary size of keyword identification. The proposed model is validated on the Google Speech Command public dataset, and its performance is compared with other recently proposed deep learning models for KWS. The proposed model achieves an excellent trade-off between recognition accuracy and computational complexity, outperforming various advanced keyword spotting models. Notably, despite a reduction in model parameters, DLiGRU-X maintains an accuracy of 97% without significant decline. This model offers greater flexibility compared to previous models, allowing users to adjust and expand the set of targeted vocabulary according to their needs and deploy the model in resource-constrained environments.https://ieeexplore.ieee.org/document/10858118/Dilated-LiGRUdeep learningGRUkeyword spottingX-vector
spellingShingle	Zong-En Wu Shao-Jung Chan Yeshanew Ale Wubet Kuang-Yow Lian DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System IEEE Access Dilated-LiGRU deep learning GRU keyword spotting X-vector
title	DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_full	DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_fullStr	DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_full_unstemmed	DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_short	DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_sort	dligru x efficient x vector based embeddings for small footprint keyword spotting system
topic	Dilated-LiGRU deep learning GRU keyword spotting X-vector
url	https://ieeexplore.ieee.org/document/10858118/
work_keys_str_mv	AT zongenwu dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem AT shaojungchan dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem AT yeshanewalewubet dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem AT kuangyowlian dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem

DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System

Similar Items