An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE

With the rapid development of artificial intelligence and the Internet of Things technology, the automatic music composition system has become a hot topic of research. This paper presents the TransVAE-Music composition system to achieve efficient multimodal data perception and fusion. Through the in...

Full description

Saved in:
Bibliographic Details
Main Author: Yifei Zhang
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:Alexandria Engineering Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110016824012808
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825206918313934848
author Yifei Zhang
author_facet Yifei Zhang
author_sort Yifei Zhang
collection DOAJ
description With the rapid development of artificial intelligence and the Internet of Things technology, the automatic music composition system has become a hot topic of research. This paper presents the TransVAE-Music composition system to achieve efficient multimodal data perception and fusion. Through the introduction of the Internet of Things technology, the system can collect and process audio, video and other data in real time, and improve the diversity and artistry of music generation. At the same time, the Bayesian optimization mechanism is used to finely adjust the hyperparameters in the system to further improve the model performance. Experimental results show that TransVAE-Music has 1.10 and 1.12 reconstruction errors on the POP909 and FMA datasets, respectively, which significantly outperforms other mainstream automatic music generation models. In addition, the model reached 4.8 and 4.9 in perceived quality score (PQS), and 4.4 and 4.5 in user satisfaction score (USS), respectively. These results demonstrate that the proposed system has significant advantages in terms of the accuracy of music generation and the user experience. This study not only provides an effective method for automatic music generation, but also provides important references for future studies on multimodal data fusion and high-quality music generation.
format Article
id doaj-art-012bd4a29a454aeb82565765c197e67c
institution Kabale University
issn 1110-0168
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series Alexandria Engineering Journal
spelling doaj-art-012bd4a29a454aeb82565765c197e67c2025-02-07T04:46:57ZengElsevierAlexandria Engineering Journal1110-01682025-02-01113378390An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAEYifei Zhang0Master’s student Department of Composition and Conducting, Shanghai Conservatory of Music, 200031, Shanghai, ChinaWith the rapid development of artificial intelligence and the Internet of Things technology, the automatic music composition system has become a hot topic of research. This paper presents the TransVAE-Music composition system to achieve efficient multimodal data perception and fusion. Through the introduction of the Internet of Things technology, the system can collect and process audio, video and other data in real time, and improve the diversity and artistry of music generation. At the same time, the Bayesian optimization mechanism is used to finely adjust the hyperparameters in the system to further improve the model performance. Experimental results show that TransVAE-Music has 1.10 and 1.12 reconstruction errors on the POP909 and FMA datasets, respectively, which significantly outperforms other mainstream automatic music generation models. In addition, the model reached 4.8 and 4.9 in perceived quality score (PQS), and 4.4 and 4.5 in user satisfaction score (USS), respectively. These results demonstrate that the proposed system has significant advantages in terms of the accuracy of music generation and the user experience. This study not only provides an effective method for automatic music generation, but also provides important references for future studies on multimodal data fusion and high-quality music generation.http://www.sciencedirect.com/science/article/pii/S1110016824012808Automatic music compositionMusic generationDeep learningAudio–visual learningInternet of things (IoT)Multimodal perception
spellingShingle Yifei Zhang
An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
Alexandria Engineering Journal
Automatic music composition
Music generation
Deep learning
Audio–visual learning
Internet of things (IoT)
Multimodal perception
title An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
title_full An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
title_fullStr An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
title_full_unstemmed An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
title_short An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
title_sort iot enhanced automatic music composition system integrating audio visual learning with transformer and sketchvae
topic Automatic music composition
Music generation
Deep learning
Audio–visual learning
Internet of things (IoT)
Multimodal perception
url http://www.sciencedirect.com/science/article/pii/S1110016824012808
work_keys_str_mv AT yifeizhang aniotenhancedautomaticmusiccompositionsystemintegratingaudiovisuallearningwithtransformerandsketchvae
AT yifeizhang iotenhancedautomaticmusiccompositionsystemintegratingaudiovisuallearningwithtransformerandsketchvae