Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
Abstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybr...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-87471-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823862347768266752 |
---|---|
author | Dinesh Chellappan Harikumar Rajaguru |
author_facet | Dinesh Chellappan Harikumar Rajaguru |
author_sort | Dinesh Chellappan |
collection | DOAJ |
description | Abstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression—NLR, Linear Regression—LR, Gaussian Mixture Model—GMM, Expectation Maximization—EM, Bayesian Linear Discriminant Analysis—BLDA, Softmax Discriminant Classifier—SDC, and Support Vector Machine with Radial Basis Function kernel—SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets. |
format | Article |
id | doaj-art-deb184beb95f46c88cff9f2e1893bb6a |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-deb184beb95f46c88cff9f2e1893bb6a2025-02-09T12:34:02ZengNature PortfolioScientific Reports2045-23222025-02-0115112710.1038/s41598-025-87471-0Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasetsDinesh Chellappan0Harikumar Rajaguru1Department of Electrical and Electronics Engineering, KPR Institute of Engineering and TechnologyDepartment of Electronics and Communication Engineering, Bannari Amman Institute of TechnologyAbstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression—NLR, Linear Regression—LR, Gaussian Mixture Model—GMM, Expectation Maximization—EM, Bayesian Linear Discriminant Analysis—BLDA, Softmax Discriminant Classifier—SDC, and Support Vector Machine with Radial Basis Function kernel—SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets.https://doi.org/10.1038/s41598-025-87471-0Early diabetes detectionMachine learningMicroarray gene expression dataFeature extractionFeature selectionNordic Islet Transplant Program (NITP) |
spellingShingle | Dinesh Chellappan Harikumar Rajaguru Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets Scientific Reports Early diabetes detection Machine learning Microarray gene expression data Feature extraction Feature selection Nordic Islet Transplant Program (NITP) |
title | Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets |
title_full | Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets |
title_fullStr | Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets |
title_full_unstemmed | Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets |
title_short | Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets |
title_sort | generalizability of machine learning models for diabetes detection a study with nordic islet transplant and pima datasets |
topic | Early diabetes detection Machine learning Microarray gene expression data Feature extraction Feature selection Nordic Islet Transplant Program (NITP) |
url | https://doi.org/10.1038/s41598-025-87471-0 |
work_keys_str_mv | AT dineshchellappan generalizabilityofmachinelearningmodelsfordiabetesdetectionastudywithnordicislettransplantandpimadatasets AT harikumarrajaguru generalizabilityofmachinelearningmodelsfordiabetesdetectionastudywithnordicislettransplantandpimadatasets |