Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets

Machine learning algorithms assume that datasets are balanced, but most of the datasets in the real world are imbalanced. Class imbalance is a major challenge in machine learning and data mining. Oversampling and undersampling methods are commonly used to address this issue. Edited Nearest Neighbor...

Full description

Saved in:
Bibliographic Details
Main Author: Sedat Korkmaz
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Engineering Science and Technology, an International Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S221509862500031X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825199386665156608
author Sedat Korkmaz
author_facet Sedat Korkmaz
author_sort Sedat Korkmaz
collection DOAJ
description Machine learning algorithms assume that datasets are balanced, but most of the datasets in the real world are imbalanced. Class imbalance is a major challenge in machine learning and data mining. Oversampling and undersampling methods are commonly used to address this issue. Edited Nearest Neighbor (ENN) and Synthetic Minority Oversampling Technique (SMOTE) are essential methods for undersampling and oversampling, respectively. DEBOHID is a recently proposed differential evolution-based oversampling approach for highly imbalanced datasets. In this work, DEBOHID and ENN methods are combined to present a novel hybrid method called D-ENN. The performance of D-ENN was evaluated using 44 highly imbalanced datasets. A parameter analysis was conducted on D-ENN to determine the optimal values for the F, CR and D-ENN-Type parameters. Three classifiers were used in the study: Support Vector Machines (SVM), Decision Tree (DT), and K-nearest Neighbor (kNN), and reported their G-mean and Area Under Curve (AUC) values. Upon evaluation of the average Winner, Mean Rank and Final Rank values obtained for each classifier and metric pair, the proposed D-ENN method demonstrated superior performance compared to nine state-of-the-art sampling methods, with an average Winner value of 13, an average Mean Rank value of 3.40 and an average Final Rank value of 1.
format Article
id doaj-art-02ad655c61f0441686f5696ea48df486
institution Kabale University
issn 2215-0986
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series Engineering Science and Technology, an International Journal
spelling doaj-art-02ad655c61f0441686f5696ea48df4862025-02-08T05:00:32ZengElsevierEngineering Science and Technology, an International Journal2215-09862025-03-0163101976Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasetsSedat Korkmaz0Konya Technical University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering, Konya, TurkeyMachine learning algorithms assume that datasets are balanced, but most of the datasets in the real world are imbalanced. Class imbalance is a major challenge in machine learning and data mining. Oversampling and undersampling methods are commonly used to address this issue. Edited Nearest Neighbor (ENN) and Synthetic Minority Oversampling Technique (SMOTE) are essential methods for undersampling and oversampling, respectively. DEBOHID is a recently proposed differential evolution-based oversampling approach for highly imbalanced datasets. In this work, DEBOHID and ENN methods are combined to present a novel hybrid method called D-ENN. The performance of D-ENN was evaluated using 44 highly imbalanced datasets. A parameter analysis was conducted on D-ENN to determine the optimal values for the F, CR and D-ENN-Type parameters. Three classifiers were used in the study: Support Vector Machines (SVM), Decision Tree (DT), and K-nearest Neighbor (kNN), and reported their G-mean and Area Under Curve (AUC) values. Upon evaluation of the average Winner, Mean Rank and Final Rank values obtained for each classifier and metric pair, the proposed D-ENN method demonstrated superior performance compared to nine state-of-the-art sampling methods, with an average Winner value of 13, an average Mean Rank value of 3.40 and an average Final Rank value of 1.http://www.sciencedirect.com/science/article/pii/S221509862500031XImbalanced LearningDEBOHIDENNOversamplingUndersampling
spellingShingle Sedat Korkmaz
Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets
Engineering Science and Technology, an International Journal
Imbalanced Learning
DEBOHID
ENN
Oversampling
Undersampling
title Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets
title_full Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets
title_fullStr Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets
title_full_unstemmed Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets
title_short Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets
title_sort hybridization of debohid with enn algorithm for highly imbalanced datasets
topic Imbalanced Learning
DEBOHID
ENN
Oversampling
Undersampling
url http://www.sciencedirect.com/science/article/pii/S221509862500031X
work_keys_str_mv AT sedatkorkmaz hybridizationofdebohidwithennalgorithmforhighlyimbalanceddatasets