Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence

Abstract The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we prop...

Full description

Saved in:
Bibliographic Details
Main Authors: Lesia Mochurad, Viktoriia Babii, Yuliia Boliubash, Yulianna Mochurad
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-02894-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861949982572544
author Lesia Mochurad
Viktoriia Babii
Yuliia Boliubash
Yulianna Mochurad
author_facet Lesia Mochurad
Viktoriia Babii
Yuliia Boliubash
Yulianna Mochurad
author_sort Lesia Mochurad
collection DOAJ
description Abstract The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we propose to integrate modern machine learning algorithms and data dimensionality reduction methods, in particular XGBoost and optimized principal component analysis (PCA), which provide data structuring and increase processing speed, especially for large datasets. For the first time, explainable artificial intelligence (XAI) is integrated into the PCA process, which increases transparency and interpretation, providing a better understanding of risk factors for medical professionals. The proposed approach was tested on two datasets, with accuracy of 95% and 98%. Cross-validation yielded an average value of 0.99, and high values of Matthew's correlation coefficient (MCC) metrics of 0.96 and Cohen's Kappa (CK) of 0.96 confirmed the generalizability and reliability of the model. The processing speed is increased threefold due to OpenMP parallelization, which makes it possible to apply it in practice. Thus, the proposed method is innovative and can potentially improve forecasting systems in the healthcare industry.
format Article
id doaj-art-2ea570fcc1354cb59fd3bc19819cf9c5
institution Kabale University
issn 1472-6947
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-2ea570fcc1354cb59fd3bc19819cf9c52025-02-09T12:40:20ZengBMCBMC Medical Informatics and Decision Making1472-69472025-02-0125112310.1186/s12911-025-02894-zImproving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligenceLesia Mochurad0Viktoriia Babii1Yuliia Boliubash2Yulianna Mochurad3Artificial Intelligence Department, Lviv Polytechnic National UniversityArtificial Intelligence Department, Lviv Polytechnic National UniversityArtificial Intelligence Department, Lviv Polytechnic National UniversityDanylo Halytsky Lviv National Medical UniversityAbstract The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we propose to integrate modern machine learning algorithms and data dimensionality reduction methods, in particular XGBoost and optimized principal component analysis (PCA), which provide data structuring and increase processing speed, especially for large datasets. For the first time, explainable artificial intelligence (XAI) is integrated into the PCA process, which increases transparency and interpretation, providing a better understanding of risk factors for medical professionals. The proposed approach was tested on two datasets, with accuracy of 95% and 98%. Cross-validation yielded an average value of 0.99, and high values of Matthew's correlation coefficient (MCC) metrics of 0.96 and Cohen's Kappa (CK) of 0.96 confirmed the generalizability and reliability of the model. The processing speed is increased threefold due to OpenMP parallelization, which makes it possible to apply it in practice. Thus, the proposed method is innovative and can potentially improve forecasting systems in the healthcare industry.https://doi.org/10.1186/s12911-025-02894-zMachine learningParallel computing technologiesSHAP methodClass balancingPCA method
spellingShingle Lesia Mochurad
Viktoriia Babii
Yuliia Boliubash
Yulianna Mochurad
Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence
BMC Medical Informatics and Decision Making
Machine learning
Parallel computing technologies
SHAP method
Class balancing
PCA method
title Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence
title_full Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence
title_fullStr Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence
title_full_unstemmed Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence
title_short Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence
title_sort improving stroke risk prediction by integrating xgboost optimized principal component analysis and explainable artificial intelligence
topic Machine learning
Parallel computing technologies
SHAP method
Class balancing
PCA method
url https://doi.org/10.1186/s12911-025-02894-z
work_keys_str_mv AT lesiamochurad improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence
AT viktoriiababii improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence
AT yuliiaboliubash improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence
AT yuliannamochurad improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence