Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset
Abstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare diff...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-88345-1 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823862311776944128 |
---|---|
author | Omid Kohandel Gargari Mobina Fathi Shahryar Rajai Firouzabadi Ida Mohammadi Mohammad Hossein Mahmoudi Mehran Sarmadi Arman Shafiee |
author_facet | Omid Kohandel Gargari Mobina Fathi Shahryar Rajai Firouzabadi Ida Mohammadi Mohammad Hossein Mahmoudi Mehran Sarmadi Arman Shafiee |
author_sort | Omid Kohandel Gargari |
collection | DOAJ |
description | Abstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare different ML diagnostic models and obtain the most valuable features for asthma diagnosis using data from the National Health and Nutrition Examination Survey (NHANES) dataset. A total of 8,888 participants with available asthma diagnosis data from the 2017–2018 NHANES survey were included. After careful selection of variables related to asthma, various ML algorithms including Support Vector Machine (SVM), Random Forest (RF), AdaBoost (ADA), XGBoost (XGB), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP) were evaluated. SVM and ADA emerged as top performers with the highest area under the curve (AUC) scores of 0.72 and 0.71, respectively. RF exhibited high accuracy but low precision. Feature interpretation using SHapley Additive exPlanations (SHAP) values identified significant predictors such as close relative asthma history, dietary fat intake, and chronic bronchitis. Feature reduction experiments showed promising results without significant loss in predictive performance. Our findings demonstrate the potential diagnosis ability of ML algorithms, particularly SVM and ADA, in asthma diagnosis by incorporating diverse clinical and demographic factors. In addition, close relative asthma history, dietary fat intake, and chronic bronchitis could be suggested as the valuable asthma diagnosis features. These outcomes can bring promising results in early diagnosis of asthma. |
format | Article |
id | doaj-art-7e99891b423545ceac8e4c8a1b71261e |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-7e99891b423545ceac8e4c8a1b71261e2025-02-09T12:33:04ZengNature PortfolioScientific Reports2045-23222025-02-0115111110.1038/s41598-025-88345-1Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES datasetOmid Kohandel Gargari0Mobina Fathi1Shahryar Rajai Firouzabadi2Ida Mohammadi3Mohammad Hossein Mahmoudi4Mehran Sarmadi5Arman Shafiee6Alborz Artificial Intelligence Association, Alborz University of Medical SciencesAdvanced Diagnostic and Interventional Radiology Research Center (ADIR)School of Medicine, Shahid Beheshti University of Medical SciencesSchool of Medicine, Shahid Beheshti University of Medical SciencesIndustrial Engineering Department, Sharif University of TechnologyComputer Engineering Department, Sharif University of TechnologyAlborz Artificial Intelligence Association, Alborz University of Medical SciencesAbstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare different ML diagnostic models and obtain the most valuable features for asthma diagnosis using data from the National Health and Nutrition Examination Survey (NHANES) dataset. A total of 8,888 participants with available asthma diagnosis data from the 2017–2018 NHANES survey were included. After careful selection of variables related to asthma, various ML algorithms including Support Vector Machine (SVM), Random Forest (RF), AdaBoost (ADA), XGBoost (XGB), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP) were evaluated. SVM and ADA emerged as top performers with the highest area under the curve (AUC) scores of 0.72 and 0.71, respectively. RF exhibited high accuracy but low precision. Feature interpretation using SHapley Additive exPlanations (SHAP) values identified significant predictors such as close relative asthma history, dietary fat intake, and chronic bronchitis. Feature reduction experiments showed promising results without significant loss in predictive performance. Our findings demonstrate the potential diagnosis ability of ML algorithms, particularly SVM and ADA, in asthma diagnosis by incorporating diverse clinical and demographic factors. In addition, close relative asthma history, dietary fat intake, and chronic bronchitis could be suggested as the valuable asthma diagnosis features. These outcomes can bring promising results in early diagnosis of asthma.https://doi.org/10.1038/s41598-025-88345-1AsthmaMachine learningSupport vector machineBronchitis |
spellingShingle | Omid Kohandel Gargari Mobina Fathi Shahryar Rajai Firouzabadi Ida Mohammadi Mohammad Hossein Mahmoudi Mehran Sarmadi Arman Shafiee Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset Scientific Reports Asthma Machine learning Support vector machine Bronchitis |
title | Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset |
title_full | Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset |
title_fullStr | Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset |
title_full_unstemmed | Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset |
title_short | Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset |
title_sort | assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in united states adults based on nhanes dataset |
topic | Asthma Machine learning Support vector machine Bronchitis |
url | https://doi.org/10.1038/s41598-025-88345-1 |
work_keys_str_mv | AT omidkohandelgargari assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset AT mobinafathi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset AT shahryarrajaifirouzabadi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset AT idamohammadi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset AT mohammadhosseinmahmoudi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset AT mehransarmadi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset AT armanshafiee assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset |