Advanced analysis of soil pollution in southwestern Ghana using Variational Autoencoders (VAE) and positive matrix factorization (PMF)

The study combined the Positive Matrix Factorization (PMF) receptor model with the Variational Autoencoders (VAE) Machine Learning technique and ecological risk indices to study the spatial distribution, sources and patterns of soil pollution in the study area. 719 soil samples were analysed for sel...

Full description

Saved in:
Bibliographic Details
Main Authors: Raymond Webrah Kazapoe, Daniel Kwayisi, Seidu Alidu, Samuel Dzidefo Sagoe, Aliyu Ohiani Umaru, Ebenezer Ebo Yahans Amuah, Millicent Obeng Addai, Obed Fiifi Fynn
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Environmental and Sustainability Indicators
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2665972725000480
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The study combined the Positive Matrix Factorization (PMF) receptor model with the Variational Autoencoders (VAE) Machine Learning technique and ecological risk indices to study the spatial distribution, sources and patterns of soil pollution in the study area. 719 soil samples were analysed for selected Potentially Toxic Elements (PTEs) concentrations. As (9.68 mg/L), and Pb (7.43 mg/L) reported elevated levels across the area linked to mining activities. The PTEs displayed a decreasing trend in the order Ba > Cr > V > Zn > Cu > Ni > As > Pb > Co. The Pearson correlation matrix outlines two main groups of PTEs: (1) moderate correlation (Ba, Cr, Cu, Ni and V) and (2) weak correlation (As, Pb and Zn). These relationships are corroborated by the VAE, which outlined a low contribution by As and a high contribution by V to all the latent dimensions. The PMF revealed three factors: Factor 1 (geogenic): Ba (77.5%), Cu (54.4%), Ni (66.4%), V (54.0) and Cr (46.8%). Factor 2 (mixed) Co (61.6%), Pb (64.8%) and Zn (71.0%). Factor 3 (anthropogenic) As (86.7%). The degree of contamination analysis depicts that 69.03% of the samples are moderately polluted, while 15.14% and 0.28% revealed considerable and very high pollution, respectively. The pollution load index shows that 20% of the samples depict the existence of pollution. The Potential Ecological Risk Index (RI) values showed that most samples (97.08%) suggest low pollution, while 2.92% depict moderate pollution. Integrating chemometric and machine learning techniques provides a dynamic system that can monitor pollution shifts early, to aid remediation efforts in highly affected areas.
ISSN:2665-9727