Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence
Abstract The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we prop...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-02-01
|
Series: | BMC Medical Informatics and Decision Making |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12911-025-02894-z |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861949982572544 |
---|---|
author | Lesia Mochurad Viktoriia Babii Yuliia Boliubash Yulianna Mochurad |
author_facet | Lesia Mochurad Viktoriia Babii Yuliia Boliubash Yulianna Mochurad |
author_sort | Lesia Mochurad |
collection | DOAJ |
description | Abstract The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we propose to integrate modern machine learning algorithms and data dimensionality reduction methods, in particular XGBoost and optimized principal component analysis (PCA), which provide data structuring and increase processing speed, especially for large datasets. For the first time, explainable artificial intelligence (XAI) is integrated into the PCA process, which increases transparency and interpretation, providing a better understanding of risk factors for medical professionals. The proposed approach was tested on two datasets, with accuracy of 95% and 98%. Cross-validation yielded an average value of 0.99, and high values of Matthew's correlation coefficient (MCC) metrics of 0.96 and Cohen's Kappa (CK) of 0.96 confirmed the generalizability and reliability of the model. The processing speed is increased threefold due to OpenMP parallelization, which makes it possible to apply it in practice. Thus, the proposed method is innovative and can potentially improve forecasting systems in the healthcare industry. |
format | Article |
id | doaj-art-2ea570fcc1354cb59fd3bc19819cf9c5 |
institution | Kabale University |
issn | 1472-6947 |
language | English |
publishDate | 2025-02-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Informatics and Decision Making |
spelling | doaj-art-2ea570fcc1354cb59fd3bc19819cf9c52025-02-09T12:40:20ZengBMCBMC Medical Informatics and Decision Making1472-69472025-02-0125112310.1186/s12911-025-02894-zImproving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligenceLesia Mochurad0Viktoriia Babii1Yuliia Boliubash2Yulianna Mochurad3Artificial Intelligence Department, Lviv Polytechnic National UniversityArtificial Intelligence Department, Lviv Polytechnic National UniversityArtificial Intelligence Department, Lviv Polytechnic National UniversityDanylo Halytsky Lviv National Medical UniversityAbstract The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we propose to integrate modern machine learning algorithms and data dimensionality reduction methods, in particular XGBoost and optimized principal component analysis (PCA), which provide data structuring and increase processing speed, especially for large datasets. For the first time, explainable artificial intelligence (XAI) is integrated into the PCA process, which increases transparency and interpretation, providing a better understanding of risk factors for medical professionals. The proposed approach was tested on two datasets, with accuracy of 95% and 98%. Cross-validation yielded an average value of 0.99, and high values of Matthew's correlation coefficient (MCC) metrics of 0.96 and Cohen's Kappa (CK) of 0.96 confirmed the generalizability and reliability of the model. The processing speed is increased threefold due to OpenMP parallelization, which makes it possible to apply it in practice. Thus, the proposed method is innovative and can potentially improve forecasting systems in the healthcare industry.https://doi.org/10.1186/s12911-025-02894-zMachine learningParallel computing technologiesSHAP methodClass balancingPCA method |
spellingShingle | Lesia Mochurad Viktoriia Babii Yuliia Boliubash Yulianna Mochurad Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence BMC Medical Informatics and Decision Making Machine learning Parallel computing technologies SHAP method Class balancing PCA method |
title | Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence |
title_full | Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence |
title_fullStr | Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence |
title_full_unstemmed | Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence |
title_short | Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence |
title_sort | improving stroke risk prediction by integrating xgboost optimized principal component analysis and explainable artificial intelligence |
topic | Machine learning Parallel computing technologies SHAP method Class balancing PCA method |
url | https://doi.org/10.1186/s12911-025-02894-z |
work_keys_str_mv | AT lesiamochurad improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence AT viktoriiababii improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence AT yuliiaboliubash improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence AT yuliannamochurad improvingstrokeriskpredictionbyintegratingxgboostoptimizedprincipalcomponentanalysisandexplainableartificialintelligence |