A Novel Ensemble Classifier Selection Method for Software Defect Prediction
The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper propos...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10869442/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823857129940844544 |
---|---|
author | Xin Dong Jie Wang Yan Liang |
author_facet | Xin Dong Jie Wang Yan Liang |
author_sort | Xin Dong |
collection | DOAJ |
description | The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper proposes Double Fault Disagreement (DFD), a novel diversity metric and method for selecting competent base classifiers for ensemble learning-based software defect prediction. To consider the diversity features of the base learners, several base learners with strong diversity are chosen to build ensemble learning. This method makes full use of the diversity characteristics of base learners, leverages their classification ability, optimizes the selection method for ensemble learning, and enhances the predictive performance of the ensemble model. The experimental results demonstrate that the DFD ensemble learning-based software defect prediction model outperforms the ten other models, including five common machine learning (ML) classification algorithms (logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM)), two deep learning (DL) algorithms (multi-layer perceptron (MLP) and convolutional neural network (CNN)), and three ensemble learning algorithms (random forest (RF), extreme gradient boosting (XGB), and stacking). The DFD model achieves superior performance on eight public NASA and PROMISE datasets (six of which are imbalanced) across five performance indicators, including area under the curve (AUC), geometric mean (G-Mean), F1 score, Matthews correlation coefficient (MCC), and Balance. Furthermore, the DFD method is not only highly performant but also requires a small number of base learners to converge rapidly, thereby reducing the amount of data and computation time required. |
format | Article |
id | doaj-art-596f7c5e94b942bc919b31684de0db98 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-596f7c5e94b942bc919b31684de0db982025-02-12T00:02:41ZengIEEEIEEE Access2169-35362025-01-0113255782559710.1109/ACCESS.2025.353765810869442A Novel Ensemble Classifier Selection Method for Software Defect PredictionXin Dong0https://orcid.org/0000-0001-9060-0103Jie Wang1Yan Liang2https://orcid.org/0000-0002-1617-457XComputer Engineering College, Chengdu Technological University, Chengdu, ChinaScience and Technology on Electronic Information Control Laboratory, Chengdu, ChinaComputer Engineering College, Chengdu Technological University, Chengdu, ChinaThe presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper proposes Double Fault Disagreement (DFD), a novel diversity metric and method for selecting competent base classifiers for ensemble learning-based software defect prediction. To consider the diversity features of the base learners, several base learners with strong diversity are chosen to build ensemble learning. This method makes full use of the diversity characteristics of base learners, leverages their classification ability, optimizes the selection method for ensemble learning, and enhances the predictive performance of the ensemble model. The experimental results demonstrate that the DFD ensemble learning-based software defect prediction model outperforms the ten other models, including five common machine learning (ML) classification algorithms (logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM)), two deep learning (DL) algorithms (multi-layer perceptron (MLP) and convolutional neural network (CNN)), and three ensemble learning algorithms (random forest (RF), extreme gradient boosting (XGB), and stacking). The DFD model achieves superior performance on eight public NASA and PROMISE datasets (six of which are imbalanced) across five performance indicators, including area under the curve (AUC), geometric mean (G-Mean), F1 score, Matthews correlation coefficient (MCC), and Balance. Furthermore, the DFD method is not only highly performant but also requires a small number of base learners to converge rapidly, thereby reducing the amount of data and computation time required.https://ieeexplore.ieee.org/document/10869442/Diversity measuresdouble fault disagreementensemble classifier selectionimbalanced data classificationsoftware defect prediction |
spellingShingle | Xin Dong Jie Wang Yan Liang A Novel Ensemble Classifier Selection Method for Software Defect Prediction IEEE Access Diversity measures double fault disagreement ensemble classifier selection imbalanced data classification software defect prediction |
title | A Novel Ensemble Classifier Selection Method for Software Defect Prediction |
title_full | A Novel Ensemble Classifier Selection Method for Software Defect Prediction |
title_fullStr | A Novel Ensemble Classifier Selection Method for Software Defect Prediction |
title_full_unstemmed | A Novel Ensemble Classifier Selection Method for Software Defect Prediction |
title_short | A Novel Ensemble Classifier Selection Method for Software Defect Prediction |
title_sort | novel ensemble classifier selection method for software defect prediction |
topic | Diversity measures double fault disagreement ensemble classifier selection imbalanced data classification software defect prediction |
url | https://ieeexplore.ieee.org/document/10869442/ |
work_keys_str_mv | AT xindong anovelensembleclassifierselectionmethodforsoftwaredefectprediction AT jiewang anovelensembleclassifierselectionmethodforsoftwaredefectprediction AT yanliang anovelensembleclassifierselectionmethodforsoftwaredefectprediction AT xindong novelensembleclassifierselectionmethodforsoftwaredefectprediction AT jiewang novelensembleclassifierselectionmethodforsoftwaredefectprediction AT yanliang novelensembleclassifierselectionmethodforsoftwaredefectprediction |