A Novel Ensemble Classifier Selection Method for Software Defect Prediction

The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper propos...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Dong, Jie Wang, Yan Liang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10869442/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823857129940844544
author Xin Dong
Jie Wang
Yan Liang
author_facet Xin Dong
Jie Wang
Yan Liang
author_sort Xin Dong
collection DOAJ
description The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper proposes Double Fault Disagreement (DFD), a novel diversity metric and method for selecting competent base classifiers for ensemble learning-based software defect prediction. To consider the diversity features of the base learners, several base learners with strong diversity are chosen to build ensemble learning. This method makes full use of the diversity characteristics of base learners, leverages their classification ability, optimizes the selection method for ensemble learning, and enhances the predictive performance of the ensemble model. The experimental results demonstrate that the DFD ensemble learning-based software defect prediction model outperforms the ten other models, including five common machine learning (ML) classification algorithms (logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM)), two deep learning (DL) algorithms (multi-layer perceptron (MLP) and convolutional neural network (CNN)), and three ensemble learning algorithms (random forest (RF), extreme gradient boosting (XGB), and stacking). The DFD model achieves superior performance on eight public NASA and PROMISE datasets (six of which are imbalanced) across five performance indicators, including area under the curve (AUC), geometric mean (G-Mean), F1 score, Matthews correlation coefficient (MCC), and Balance. Furthermore, the DFD method is not only highly performant but also requires a small number of base learners to converge rapidly, thereby reducing the amount of data and computation time required.
format Article
id doaj-art-596f7c5e94b942bc919b31684de0db98
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-596f7c5e94b942bc919b31684de0db982025-02-12T00:02:41ZengIEEEIEEE Access2169-35362025-01-0113255782559710.1109/ACCESS.2025.353765810869442A Novel Ensemble Classifier Selection Method for Software Defect PredictionXin Dong0https://orcid.org/0000-0001-9060-0103Jie Wang1Yan Liang2https://orcid.org/0000-0002-1617-457XComputer Engineering College, Chengdu Technological University, Chengdu, ChinaScience and Technology on Electronic Information Control Laboratory, Chengdu, ChinaComputer Engineering College, Chengdu Technological University, Chengdu, ChinaThe presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper proposes Double Fault Disagreement (DFD), a novel diversity metric and method for selecting competent base classifiers for ensemble learning-based software defect prediction. To consider the diversity features of the base learners, several base learners with strong diversity are chosen to build ensemble learning. This method makes full use of the diversity characteristics of base learners, leverages their classification ability, optimizes the selection method for ensemble learning, and enhances the predictive performance of the ensemble model. The experimental results demonstrate that the DFD ensemble learning-based software defect prediction model outperforms the ten other models, including five common machine learning (ML) classification algorithms (logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM)), two deep learning (DL) algorithms (multi-layer perceptron (MLP) and convolutional neural network (CNN)), and three ensemble learning algorithms (random forest (RF), extreme gradient boosting (XGB), and stacking). The DFD model achieves superior performance on eight public NASA and PROMISE datasets (six of which are imbalanced) across five performance indicators, including area under the curve (AUC), geometric mean (G-Mean), F1 score, Matthews correlation coefficient (MCC), and Balance. Furthermore, the DFD method is not only highly performant but also requires a small number of base learners to converge rapidly, thereby reducing the amount of data and computation time required.https://ieeexplore.ieee.org/document/10869442/Diversity measuresdouble fault disagreementensemble classifier selectionimbalanced data classificationsoftware defect prediction
spellingShingle Xin Dong
Jie Wang
Yan Liang
A Novel Ensemble Classifier Selection Method for Software Defect Prediction
IEEE Access
Diversity measures
double fault disagreement
ensemble classifier selection
imbalanced data classification
software defect prediction
title A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_full A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_fullStr A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_full_unstemmed A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_short A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_sort novel ensemble classifier selection method for software defect prediction
topic Diversity measures
double fault disagreement
ensemble classifier selection
imbalanced data classification
software defect prediction
url https://ieeexplore.ieee.org/document/10869442/
work_keys_str_mv AT xindong anovelensembleclassifierselectionmethodforsoftwaredefectprediction
AT jiewang anovelensembleclassifierselectionmethodforsoftwaredefectprediction
AT yanliang anovelensembleclassifierselectionmethodforsoftwaredefectprediction
AT xindong novelensembleclassifierselectionmethodforsoftwaredefectprediction
AT jiewang novelensembleclassifierselectionmethodforsoftwaredefectprediction
AT yanliang novelensembleclassifierselectionmethodforsoftwaredefectprediction