Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods

Abstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a l...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuting Shi, Haowen Lin, Leiming Jiang, Zhiqi Zeng, ChuiXu Lin, Pei Li, Yinghua Li, Zifeng Yang
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:npj Climate and Atmospheric Science
Online Access:https://doi.org/10.1038/s41612-025-00894-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825197346565128192
author Shuting Shi
Haowen Lin
Leiming Jiang
Zhiqi Zeng
ChuiXu Lin
Pei Li
Yinghua Li
Zifeng Yang
author_facet Shuting Shi
Haowen Lin
Leiming Jiang
Zhiqi Zeng
ChuiXu Lin
Pei Li
Yinghua Li
Zifeng Yang
author_sort Shuting Shi
collection DOAJ
description Abstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a limited number of respiratory pathogens. This study aimed to develop a nationwide respiratory virus infection risk prediction model through machine learning approach. We utilized the CRFC algorithm, a random forest-based method for multi-label classification, to predict the presence of various respiratory viruses. The model integrated binary classification outcomes for each virus category and incorporated air quality and meteorological data to enhance its accuracy. The data was collected from 31 regions in China between 2016 and 2021, encompassing pathogen detection, air quality indices, and meteorological measurements. The model’s performance was evaluated using ROC curves, AUC scores, and precision-recall curves. Our model demonstrated robust performance across various metrics, with an average overall accuracy of 0.76, macro sensitivity of 0.75, macro precision of 0.77, and an average AUC score of 0.9. The SHAP framework was employed to interpret the model’s predictions, revealing significant contributions from parameters such as age, NO2 levels, and meteorological conditions. Our model provides a reliable tool for predicting respiratory virus risks, with a comprehensive integration of environmental and clinical data. The model’s performance metrics indicate its potential utility in clinical decision-making and public health planning. Future work will focus on refining the model and expanding its applicability to diverse populations and settings.
format Article
id doaj-art-5824afa13cc14564851426a465f91ced
institution Kabale University
issn 2397-3722
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series npj Climate and Atmospheric Science
spelling doaj-art-5824afa13cc14564851426a465f91ced2025-02-09T12:27:11ZengNature Portfolionpj Climate and Atmospheric Science2397-37222025-02-018111110.1038/s41612-025-00894-4Development of a respiratory virus risk model with environmental data based on interpretable machine learning methodsShuting Shi0Haowen Lin1Leiming Jiang2Zhiqi Zeng3ChuiXu Lin4Pei Li5Yinghua Li6Zifeng Yang7Guangzhou KingMed Diagnostics Group Co., Ltd.Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical SciencesKingMed School of Laboratory Medicine, Guangzhou Medical UniversityKingMed School of Laboratory Medicine, Guangzhou Medical UniversityGuangzhou KingMed Diagnostics Group Co., Ltd.Guangzhou KingMed Diagnostics Group Co., Ltd.Guangzhou KingMed Diagnostics Group Co., Ltd.State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical UniversityAbstract In recent years, numerous studies have explored the relationship between atmospheric conditions and respiratory viral infections. However, these investigations have faced certain limitations, such as the use of modestly sized datasets, a restricted geographical focus, and an emphasis on a limited number of respiratory pathogens. This study aimed to develop a nationwide respiratory virus infection risk prediction model through machine learning approach. We utilized the CRFC algorithm, a random forest-based method for multi-label classification, to predict the presence of various respiratory viruses. The model integrated binary classification outcomes for each virus category and incorporated air quality and meteorological data to enhance its accuracy. The data was collected from 31 regions in China between 2016 and 2021, encompassing pathogen detection, air quality indices, and meteorological measurements. The model’s performance was evaluated using ROC curves, AUC scores, and precision-recall curves. Our model demonstrated robust performance across various metrics, with an average overall accuracy of 0.76, macro sensitivity of 0.75, macro precision of 0.77, and an average AUC score of 0.9. The SHAP framework was employed to interpret the model’s predictions, revealing significant contributions from parameters such as age, NO2 levels, and meteorological conditions. Our model provides a reliable tool for predicting respiratory virus risks, with a comprehensive integration of environmental and clinical data. The model’s performance metrics indicate its potential utility in clinical decision-making and public health planning. Future work will focus on refining the model and expanding its applicability to diverse populations and settings.https://doi.org/10.1038/s41612-025-00894-4
spellingShingle Shuting Shi
Haowen Lin
Leiming Jiang
Zhiqi Zeng
ChuiXu Lin
Pei Li
Yinghua Li
Zifeng Yang
Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
npj Climate and Atmospheric Science
title Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
title_full Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
title_fullStr Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
title_full_unstemmed Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
title_short Development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
title_sort development of a respiratory virus risk model with environmental data based on interpretable machine learning methods
url https://doi.org/10.1038/s41612-025-00894-4
work_keys_str_mv AT shutingshi developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods
AT haowenlin developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods
AT leimingjiang developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods
AT zhiqizeng developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods
AT chuixulin developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods
AT peili developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods
AT yinghuali developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods
AT zifengyang developmentofarespiratoryvirusriskmodelwithenvironmentaldatabasedoninterpretablemachinelearningmethods