Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset

Abstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare diff...

Full description

Saved in:
Bibliographic Details
Main Authors: Omid Kohandel Gargari, Mobina Fathi, Shahryar Rajai Firouzabadi, Ida Mohammadi, Mohammad Hossein Mahmoudi, Mehran Sarmadi, Arman Shafiee
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88345-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862311776944128
author Omid Kohandel Gargari
Mobina Fathi
Shahryar Rajai Firouzabadi
Ida Mohammadi
Mohammad Hossein Mahmoudi
Mehran Sarmadi
Arman Shafiee
author_facet Omid Kohandel Gargari
Mobina Fathi
Shahryar Rajai Firouzabadi
Ida Mohammadi
Mohammad Hossein Mahmoudi
Mehran Sarmadi
Arman Shafiee
author_sort Omid Kohandel Gargari
collection DOAJ
description Abstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare different ML diagnostic models and obtain the most valuable features for asthma diagnosis using data from the National Health and Nutrition Examination Survey (NHANES) dataset. A total of 8,888 participants with available asthma diagnosis data from the 2017–2018 NHANES survey were included. After careful selection of variables related to asthma, various ML algorithms including Support Vector Machine (SVM), Random Forest (RF), AdaBoost (ADA), XGBoost (XGB), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP) were evaluated. SVM and ADA emerged as top performers with the highest area under the curve (AUC) scores of 0.72 and 0.71, respectively. RF exhibited high accuracy but low precision. Feature interpretation using SHapley Additive exPlanations (SHAP) values identified significant predictors such as close relative asthma history, dietary fat intake, and chronic bronchitis. Feature reduction experiments showed promising results without significant loss in predictive performance. Our findings demonstrate the potential diagnosis ability of ML algorithms, particularly SVM and ADA, in asthma diagnosis by incorporating diverse clinical and demographic factors. In addition, close relative asthma history, dietary fat intake, and chronic bronchitis could be suggested as the valuable asthma diagnosis features. These outcomes can bring promising results in early diagnosis of asthma.
format Article
id doaj-art-7e99891b423545ceac8e4c8a1b71261e
institution Kabale University
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-7e99891b423545ceac8e4c8a1b71261e2025-02-09T12:33:04ZengNature PortfolioScientific Reports2045-23222025-02-0115111110.1038/s41598-025-88345-1Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES datasetOmid Kohandel Gargari0Mobina Fathi1Shahryar Rajai Firouzabadi2Ida Mohammadi3Mohammad Hossein Mahmoudi4Mehran Sarmadi5Arman Shafiee6Alborz Artificial Intelligence Association, Alborz University of Medical SciencesAdvanced Diagnostic and Interventional Radiology Research Center (ADIR)School of Medicine, Shahid Beheshti University of Medical SciencesSchool of Medicine, Shahid Beheshti University of Medical SciencesIndustrial Engineering Department, Sharif University of TechnologyComputer Engineering Department, Sharif University of TechnologyAlborz Artificial Intelligence Association, Alborz University of Medical SciencesAbstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare different ML diagnostic models and obtain the most valuable features for asthma diagnosis using data from the National Health and Nutrition Examination Survey (NHANES) dataset. A total of 8,888 participants with available asthma diagnosis data from the 2017–2018 NHANES survey were included. After careful selection of variables related to asthma, various ML algorithms including Support Vector Machine (SVM), Random Forest (RF), AdaBoost (ADA), XGBoost (XGB), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP) were evaluated. SVM and ADA emerged as top performers with the highest area under the curve (AUC) scores of 0.72 and 0.71, respectively. RF exhibited high accuracy but low precision. Feature interpretation using SHapley Additive exPlanations (SHAP) values identified significant predictors such as close relative asthma history, dietary fat intake, and chronic bronchitis. Feature reduction experiments showed promising results without significant loss in predictive performance. Our findings demonstrate the potential diagnosis ability of ML algorithms, particularly SVM and ADA, in asthma diagnosis by incorporating diverse clinical and demographic factors. In addition, close relative asthma history, dietary fat intake, and chronic bronchitis could be suggested as the valuable asthma diagnosis features. These outcomes can bring promising results in early diagnosis of asthma.https://doi.org/10.1038/s41598-025-88345-1AsthmaMachine learningSupport vector machineBronchitis
spellingShingle Omid Kohandel Gargari
Mobina Fathi
Shahryar Rajai Firouzabadi
Ida Mohammadi
Mohammad Hossein Mahmoudi
Mehran Sarmadi
Arman Shafiee
Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset
Scientific Reports
Asthma
Machine learning
Support vector machine
Bronchitis
title Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset
title_full Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset
title_fullStr Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset
title_full_unstemmed Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset
title_short Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset
title_sort assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in united states adults based on nhanes dataset
topic Asthma
Machine learning
Support vector machine
Bronchitis
url https://doi.org/10.1038/s41598-025-88345-1
work_keys_str_mv AT omidkohandelgargari assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset
AT mobinafathi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset
AT shahryarrajaifirouzabadi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset
AT idamohammadi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset
AT mohammadhosseinmahmoudi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset
AT mehransarmadi assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset
AT armanshafiee assessingthediagnosticaccuracyofmachinelearningalgorithmsforidentificationofasthmainunitedstatesadultsbasedonnhanesdataset