RCE-IFE: recursive cluster elimination with intra-cluster feature elimination

The computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniqu...

Full description

Saved in:
Bibliographic Details
Main Authors: Cihan Kuzudisli, Burcu Bakir-Gungor, Bahjat Qaqish, Malik Yousef
Format: Article
Language:English
Published: PeerJ Inc. 2025-02-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2528.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861396024066048
author Cihan Kuzudisli
Burcu Bakir-Gungor
Bahjat Qaqish
Malik Yousef
author_facet Cihan Kuzudisli
Burcu Bakir-Gungor
Bahjat Qaqish
Malik Yousef
author_sort Cihan Kuzudisli
collection DOAJ
description The computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniques that seek to detect strong correlations among features and identify irrelevant features. In this work, we propose the Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE) method that utilizes feature grouping and iterates grouping and elimination steps in a supervised context. We assess dimensionality reduction and discriminatory capabilities of RCE-IFE on various high-dimensional datasets from different biological domains. For a set of gene expression, microRNA (miRNA) expression, and methylation datasets, the performance of RCE-IFE is comparatively evaluated with RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE. On average, RCE-IFE attains an area under the curve (AUC) of 0.85 among tested expression datasets with the fewest features and the shortest running time, while RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE achieve similar AUCs of 0.84 and 0.83, respectively. RCE-IFE and SVM-RCE yield AUCs of 0.79 and 0.68, respectively when averaged over seven different metagenomics datasets, with RCE-IFE significantly reducing feature subsets. Furthermore, RCE-IFE surpasses several state-of-the-art FS methods, such as Minimum Redundancy Maximum Relevance (MRMR), Fast Correlation-Based Filter (FCBF), Information Gain (IG), Conditional Mutual Information Maximization (CMIM), SelectKBest (SKB), and eXtreme Gradient Boosting (XGBoost), obtaining an average AUC of 0.76 on five gene expression datasets. Compared with a similar tool, Multi-stage, RCE-IFE gives a similar average accuracy rate of 89.27% using fewer features on four cancer-related datasets. The comparability of RCE-IFE is also verified with other biological domain knowledge-based Grouping-Scoring-Modeling (G-S-M) tools, including mirGediNET, 3Mint, and miRcorrNet. Additionally, the biological relevance of the selected features by RCE-IFE is evaluated. The proposed method also exhibits high consistency in terms of the selected features across multiple runs. Our experimental findings imply that RCE-IFE provides robust classifier performance and significantly reduces feature size while maintaining feature relevance and consistency.
format Article
id doaj-art-195fecf2a2e1414a90ed593da280c82e
institution Kabale University
issn 2376-5992
language English
publishDate 2025-02-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-195fecf2a2e1414a90ed593da280c82e2025-02-09T15:05:14ZengPeerJ Inc.PeerJ Computer Science2376-59922025-02-0111e252810.7717/peerj-cs.2528RCE-IFE: recursive cluster elimination with intra-cluster feature eliminationCihan Kuzudisli0Burcu Bakir-Gungor1Bahjat Qaqish2Malik Yousef3Department of Computer Engineering, Faculty of Engineering, Hasan Kalyoncu University, Gaziantep, TurkeyDepartment of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, TurkeyDepartment of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, Chapel Hill, United StatesDepartment of Information Systems, Zefat Academic College, Zefat, IsraelThe computational and interpretational difficulties caused by the ever-increasing dimensionality of biological data generated by new technologies pose a significant challenge. Feature selection (FS) methods aim to reduce the dimension, and feature grouping has emerged as a foundation for FS techniques that seek to detect strong correlations among features and identify irrelevant features. In this work, we propose the Recursive Cluster Elimination with Intra-Cluster Feature Elimination (RCE-IFE) method that utilizes feature grouping and iterates grouping and elimination steps in a supervised context. We assess dimensionality reduction and discriminatory capabilities of RCE-IFE on various high-dimensional datasets from different biological domains. For a set of gene expression, microRNA (miRNA) expression, and methylation datasets, the performance of RCE-IFE is comparatively evaluated with RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE. On average, RCE-IFE attains an area under the curve (AUC) of 0.85 among tested expression datasets with the fewest features and the shortest running time, while RCE-IFE-SVM (the SVM-adapted version of RCE-IFE) and SVM-RCE achieve similar AUCs of 0.84 and 0.83, respectively. RCE-IFE and SVM-RCE yield AUCs of 0.79 and 0.68, respectively when averaged over seven different metagenomics datasets, with RCE-IFE significantly reducing feature subsets. Furthermore, RCE-IFE surpasses several state-of-the-art FS methods, such as Minimum Redundancy Maximum Relevance (MRMR), Fast Correlation-Based Filter (FCBF), Information Gain (IG), Conditional Mutual Information Maximization (CMIM), SelectKBest (SKB), and eXtreme Gradient Boosting (XGBoost), obtaining an average AUC of 0.76 on five gene expression datasets. Compared with a similar tool, Multi-stage, RCE-IFE gives a similar average accuracy rate of 89.27% using fewer features on four cancer-related datasets. The comparability of RCE-IFE is also verified with other biological domain knowledge-based Grouping-Scoring-Modeling (G-S-M) tools, including mirGediNET, 3Mint, and miRcorrNet. Additionally, the biological relevance of the selected features by RCE-IFE is evaluated. The proposed method also exhibits high consistency in terms of the selected features across multiple runs. Our experimental findings imply that RCE-IFE provides robust classifier performance and significantly reduces feature size while maintaining feature relevance and consistency.https://peerj.com/articles/cs-2528.pdfFeature groupingFeature selectionRecursive cluster eliminationIntra-cluster feature eliminationDisease
spellingShingle Cihan Kuzudisli
Burcu Bakir-Gungor
Bahjat Qaqish
Malik Yousef
RCE-IFE: recursive cluster elimination with intra-cluster feature elimination
PeerJ Computer Science
Feature grouping
Feature selection
Recursive cluster elimination
Intra-cluster feature elimination
Disease
title RCE-IFE: recursive cluster elimination with intra-cluster feature elimination
title_full RCE-IFE: recursive cluster elimination with intra-cluster feature elimination
title_fullStr RCE-IFE: recursive cluster elimination with intra-cluster feature elimination
title_full_unstemmed RCE-IFE: recursive cluster elimination with intra-cluster feature elimination
title_short RCE-IFE: recursive cluster elimination with intra-cluster feature elimination
title_sort rce ife recursive cluster elimination with intra cluster feature elimination
topic Feature grouping
Feature selection
Recursive cluster elimination
Intra-cluster feature elimination
Disease
url https://peerj.com/articles/cs-2528.pdf
work_keys_str_mv AT cihankuzudisli rceiferecursiveclustereliminationwithintraclusterfeatureelimination
AT burcubakirgungor rceiferecursiveclustereliminationwithintraclusterfeatureelimination
AT bahjatqaqish rceiferecursiveclustereliminationwithintraclusterfeatureelimination
AT malikyousef rceiferecursiveclustereliminationwithintraclusterfeatureelimination