DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10858118/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823859593647751168 |
---|---|
author | Zong-En Wu Shao-Jung Chan Yeshanew Ale Wubet Kuang-Yow Lian |
author_facet | Zong-En Wu Shao-Jung Chan Yeshanew Ale Wubet Kuang-Yow Lian |
author_sort | Zong-En Wu |
collection | DOAJ |
description | Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated recurrent unit in X-vector (DLiGRU-X) to address these challenges specifically for small-footprint keyword spotting (KWS) tasks. The DLiGRU-X model enhances temporal feature extraction with a LiGRU and reduces computational complexity through dilated convolution techniques. The proposed model efficiently learns speech signal characteristics, making it suitable for scenarios with limited hardware resources and can handle an expanded vocabulary size of keyword identification. The proposed model is validated on the Google Speech Command public dataset, and its performance is compared with other recently proposed deep learning models for KWS. The proposed model achieves an excellent trade-off between recognition accuracy and computational complexity, outperforming various advanced keyword spotting models. Notably, despite a reduction in model parameters, DLiGRU-X maintains an accuracy of 97% without significant decline. This model offers greater flexibility compared to previous models, allowing users to adjust and expand the set of targeted vocabulary according to their needs and deploy the model in resource-constrained environments. |
format | Article |
id | doaj-art-19207b5a2de7499e92091379ed0edba2 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-19207b5a2de7499e92091379ed0edba22025-02-11T00:00:47ZengIEEEIEEE Access2169-35362025-01-0113234982350710.1109/ACCESS.2025.353647010858118DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting SystemZong-En Wu0Shao-Jung Chan1https://orcid.org/0009-0000-9203-9305Yeshanew Ale Wubet2https://orcid.org/0000-0002-1411-715XKuang-Yow Lian3https://orcid.org/0000-0002-5692-9279Department of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDeployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated recurrent unit in X-vector (DLiGRU-X) to address these challenges specifically for small-footprint keyword spotting (KWS) tasks. The DLiGRU-X model enhances temporal feature extraction with a LiGRU and reduces computational complexity through dilated convolution techniques. The proposed model efficiently learns speech signal characteristics, making it suitable for scenarios with limited hardware resources and can handle an expanded vocabulary size of keyword identification. The proposed model is validated on the Google Speech Command public dataset, and its performance is compared with other recently proposed deep learning models for KWS. The proposed model achieves an excellent trade-off between recognition accuracy and computational complexity, outperforming various advanced keyword spotting models. Notably, despite a reduction in model parameters, DLiGRU-X maintains an accuracy of 97% without significant decline. This model offers greater flexibility compared to previous models, allowing users to adjust and expand the set of targeted vocabulary according to their needs and deploy the model in resource-constrained environments.https://ieeexplore.ieee.org/document/10858118/Dilated-LiGRUdeep learningGRUkeyword spottingX-vector |
spellingShingle | Zong-En Wu Shao-Jung Chan Yeshanew Ale Wubet Kuang-Yow Lian DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System IEEE Access Dilated-LiGRU deep learning GRU keyword spotting X-vector |
title | DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System |
title_full | DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System |
title_fullStr | DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System |
title_full_unstemmed | DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System |
title_short | DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System |
title_sort | dligru x efficient x vector based embeddings for small footprint keyword spotting system |
topic | Dilated-LiGRU deep learning GRU keyword spotting X-vector |
url | https://ieeexplore.ieee.org/document/10858118/ |
work_keys_str_mv | AT zongenwu dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem AT shaojungchan dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem AT yeshanewalewubet dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem AT kuangyowlian dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem |