Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets
Machine learning algorithms assume that datasets are balanced, but most of the datasets in the real world are imbalanced. Class imbalance is a major challenge in machine learning and data mining. Oversampling and undersampling methods are commonly used to address this issue. Edited Nearest Neighbor...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-03-01
|
Series: | Engineering Science and Technology, an International Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S221509862500031X |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning algorithms assume that datasets are balanced, but most of the datasets in the real world are imbalanced. Class imbalance is a major challenge in machine learning and data mining. Oversampling and undersampling methods are commonly used to address this issue. Edited Nearest Neighbor (ENN) and Synthetic Minority Oversampling Technique (SMOTE) are essential methods for undersampling and oversampling, respectively. DEBOHID is a recently proposed differential evolution-based oversampling approach for highly imbalanced datasets. In this work, DEBOHID and ENN methods are combined to present a novel hybrid method called D-ENN. The performance of D-ENN was evaluated using 44 highly imbalanced datasets. A parameter analysis was conducted on D-ENN to determine the optimal values for the F, CR and D-ENN-Type parameters. Three classifiers were used in the study: Support Vector Machines (SVM), Decision Tree (DT), and K-nearest Neighbor (kNN), and reported their G-mean and Area Under Curve (AUC) values. Upon evaluation of the average Winner, Mean Rank and Final Rank values obtained for each classifier and metric pair, the proposed D-ENN method demonstrated superior performance compared to nine state-of-the-art sampling methods, with an average Winner value of 13, an average Mean Rank value of 3.40 and an average Final Rank value of 1. |
---|---|
ISSN: | 2215-0986 |