Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets

Abstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybr...

Full description

Saved in:
Bibliographic Details
Main Authors: Dinesh Chellappan, Harikumar Rajaguru
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-87471-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862347768266752
author Dinesh Chellappan
Harikumar Rajaguru
author_facet Dinesh Chellappan
Harikumar Rajaguru
author_sort Dinesh Chellappan
collection DOAJ
description Abstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression—NLR, Linear Regression—LR, Gaussian Mixture Model—GMM, Expectation Maximization—EM, Bayesian Linear Discriminant Analysis—BLDA, Softmax Discriminant Classifier—SDC, and Support Vector Machine with Radial Basis Function kernel—SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets.
format Article
id doaj-art-deb184beb95f46c88cff9f2e1893bb6a
institution Kabale University
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-deb184beb95f46c88cff9f2e1893bb6a2025-02-09T12:34:02ZengNature PortfolioScientific Reports2045-23222025-02-0115112710.1038/s41598-025-87471-0Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasetsDinesh Chellappan0Harikumar Rajaguru1Department of Electrical and Electronics Engineering, KPR Institute of Engineering and TechnologyDepartment of Electronics and Communication Engineering, Bannari Amman Institute of TechnologyAbstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression—NLR, Linear Regression—LR, Gaussian Mixture Model—GMM, Expectation Maximization—EM, Bayesian Linear Discriminant Analysis—BLDA, Softmax Discriminant Classifier—SDC, and Support Vector Machine with Radial Basis Function kernel—SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets.https://doi.org/10.1038/s41598-025-87471-0Early diabetes detectionMachine learningMicroarray gene expression dataFeature extractionFeature selectionNordic Islet Transplant Program (NITP)
spellingShingle Dinesh Chellappan
Harikumar Rajaguru
Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
Scientific Reports
Early diabetes detection
Machine learning
Microarray gene expression data
Feature extraction
Feature selection
Nordic Islet Transplant Program (NITP)
title Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_full Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_fullStr Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_full_unstemmed Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_short Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_sort generalizability of machine learning models for diabetes detection a study with nordic islet transplant and pima datasets
topic Early diabetes detection
Machine learning
Microarray gene expression data
Feature extraction
Feature selection
Nordic Islet Transplant Program (NITP)
url https://doi.org/10.1038/s41598-025-87471-0
work_keys_str_mv AT dineshchellappan generalizabilityofmachinelearningmodelsfordiabetesdetectionastudywithnordicislettransplantandpimadatasets
AT harikumarrajaguru generalizabilityofmachinelearningmodelsfordiabetesdetectionastudywithnordicislettransplantandpimadatasets