Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
Abstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a l...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | npj Climate and Atmospheric Science |
Online Access: | https://doi.org/10.1038/s41612-025-00894-4 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825197346565128192 |
---|---|
author | Shuting Shi Haowen Lin Leiming Jiang Zhiqi Zeng ChuiXu Lin Pei Li Yinghua Li Zifeng Yang |
author_facet | Shuting Shi Haowen Lin Leiming Jiang Zhiqi Zeng ChuiXu Lin Pei Li Yinghua Li Zifeng Yang |
author_sort | Shuting Shi |
collection | DOAJ |
description | Abstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a limited number of respiratory pathogens. This study aimed to develop a nationwide respiratory virus infection risk prediction model through machine learning approach. We utilized the CRFC algorithm, a random forest-based method for multi-label classification, to predict the presence of various respiratory viruses. The model integrated binary classification outcomes for each virus category and incorporated air quality and meteorological data to enhance its accuracy. The data was collected from 31 regions in China between 2016 and 2021, encompassing pathogen detection, air quality indices, and meteorological measurements. The model’s performance was evaluated using ROC curves, AUC scores, and precision-recall curves. Our model demonstrated robust performance across various metrics, with an average overall accuracy of 0.76, macro sensitivity of 0.75, macro precision of 0.77, and an average AUC score of 0.9. The SHAP framework was employed to interpret the model’s predictions, revealing significant contributions from parameters such as age, NO2 levels, and meteorological conditions. Our model provides a reliable tool for predicting respiratory virus risks, with a comprehensive integration of environmental and clinical data. The model’s performance metrics indicate its potential utility in clinical decision-making and public health planning. Future work will focus on refining the model and expanding its applicability to diverse populations and settings. |
format | Article |
id | doaj-art-5824afa13cc14564851426a465f91ced |
institution | Kabale University |
issn | 2397-3722 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Climate and Atmospheric Science |
spelling | doaj-art-5824afa13cc14564851426a465f91ced2025-02-09T12:27:11ZengNature Portfolionpj Climate and Atmospheric Science2397-37222025-02-018111110.1038/s41612-025-00894-4Development of a respiratory virus risk model with environmental data based on interpretable machine learning methodsShuting Shi0Haowen Lin1Leiming Jiang2Zhiqi Zeng3ChuiXu Lin4Pei Li5Yinghua Li6Zifeng Yang7Guangzhou KingMed Diagnostics Group Co., Ltd.Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical SciencesKingMed School of Laboratory Medicine, Guangzhou Medical UniversityKingMed School of Laboratory Medicine, Guangzhou Medical UniversityGuangzhou KingMed Diagnostics Group Co., Ltd.Guangzhou KingMed Diagnostics Group Co., Ltd.Guangzhou KingMed Diagnostics Group Co., Ltd.State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical UniversityAbstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a limited number of respiratory pathogens. This study aimed to develop a nationwide respiratory virus infection risk prediction model through machine learning approach. We utilized the CRFC algorithm, a random forest-based method for multi-label classification, to predict the presence of various respiratory viruses. The model integrated binary classification outcomes for each virus category and incorporated air quality and meteorological data to enhance its accuracy. The data was collected from 31 regions in China between 2016 and 2021, encompassing pathogen detection, air quality indices, and meteorological measurements. The model’s performance was evaluated using ROC curves, AUC scores, and precision-recall curves. Our model demonstrated robust performance across various metrics, with an average overall accuracy of 0.76, macro sensitivity of 0.75, macro precision of 0.77, and an average AUC score of 0.9. The SHAP framework was employed to interpret the model’s predictions, revealing significant contributions from parameters such as age, NO2 levels, and meteorological conditions. Our model provides a reliable tool for predicting respiratory virus risks, with a comprehensive integration of environmental and clinical data. The model’s performance metrics indicate its potential utility in clinical decision-making and public health planning. Future work will focus on refining the model and expanding its applicability to diverse populations and settings.https://doi.org/10.1038/s41612-025-00894-4 |
spellingShingle | Shuting Shi Haowen Lin Leiming Jiang Zhiqi Zeng ChuiXu Lin Pei Li Yinghua Li Zifeng Yang Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods npj Climate and Atmospheric Science |
title | Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods |
title_full | Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods |
title_fullStr | Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods |
title_full_unstemmed | Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods |
title_short | Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods |
title_sort | development of a respiratory virus risk model with environmental data based on interpretable machine learning methods |
url | https://doi.org/10.1038/s41612-025-00894-4 |
work_keys_str_mv | AT shutingshi developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods AT haowenlin developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods AT leimingjiang developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods AT zhiqizeng developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods AT chuixulin developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods AT peili developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods AT yinghuali developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods AT zifengyang developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods |