Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets

Abstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Dinesh Chellappan, Harikumar Rajaguru
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-02-01
Series:	Scientific Reports
Subjects:	Early diabetes detection Machine learning Microarray gene expression data Feature extraction Feature selection Nordic Islet Transplant Program (NITP)
Online Access:	https://doi.org/10.1038/s41598-025-87471-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823862347768266752
author	Dinesh Chellappan Harikumar Rajaguru
author_facet	Dinesh Chellappan Harikumar Rajaguru
author_sort	Dinesh Chellappan
collection	DOAJ
description	Abstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression—NLR, Linear Regression—LR, Gaussian Mixture Model—GMM, Expectation Maximization—EM, Bayesian Linear Discriminant Analysis—BLDA, Softmax Discriminant Classifier—SDC, and Support Vector Machine with Radial Basis Function kernel—SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets.
format	Article
id	doaj-art-deb184beb95f46c88cff9f2e1893bb6a
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-02-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-deb184beb95f46c88cff9f2e1893bb6a2025-02-09T12:34:02ZengNature PortfolioScientific Reports2045-23222025-02-0115112710.1038/s41598-025-87471-0Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasetsDinesh Chellappan0Harikumar Rajaguru1Department of Electrical and Electronics Engineering, KPR Institute of Engineering and TechnologyDepartment of Electronics and Communication Engineering, Bannari Amman Institute of TechnologyAbstract Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression—NLR, Linear Regression—LR, Gaussian Mixture Model—GMM, Expectation Maximization—EM, Bayesian Linear Discriminant Analysis—BLDA, Softmax Discriminant Classifier—SDC, and Support Vector Machine with Radial Basis Function kernel—SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets.https://doi.org/10.1038/s41598-025-87471-0Early diabetes detectionMachine learningMicroarray gene expression dataFeature extractionFeature selectionNordic Islet Transplant Program (NITP)
spellingShingle	Dinesh Chellappan Harikumar Rajaguru Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets Scientific Reports Early diabetes detection Machine learning Microarray gene expression data Feature extraction Feature selection Nordic Islet Transplant Program (NITP)
title	Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_full	Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_fullStr	Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_full_unstemmed	Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_short	Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
title_sort	generalizability of machine learning models for diabetes detection a study with nordic islet transplant and pima datasets
topic	Early diabetes detection Machine learning Microarray gene expression data Feature extraction Feature selection Nordic Islet Transplant Program (NITP)
url	https://doi.org/10.1038/s41598-025-87471-0
work_keys_str_mv	AT dineshchellappan generalizabilityofmachinelearningmodelsfordiabetesdetectionastudywithnordicislettransplantandpimadatasets AT harikumarrajaguru generalizabilityofmachinelearningmodelsfordiabetesdetectionastudywithnordicislettransplantandpimadatasets

Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets

Similar Items