A Novel Ensemble Classifier Selection Method for Software Defect Prediction

The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper propos...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xin Dong, Jie Wang, Yan Liang
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Diversity measures double fault disagreement ensemble classifier selection imbalanced data classification software defect prediction
Online Access:	https://ieeexplore.ieee.org/document/10869442/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823857129940844544
author	Xin Dong Jie Wang Yan Liang
author_facet	Xin Dong Jie Wang Yan Liang
author_sort	Xin Dong
collection	DOAJ
description	The presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper proposes Double Fault Disagreement (DFD), a novel diversity metric and method for selecting competent base classifiers for ensemble learning-based software defect prediction. To consider the diversity features of the base learners, several base learners with strong diversity are chosen to build ensemble learning. This method makes full use of the diversity characteristics of base learners, leverages their classification ability, optimizes the selection method for ensemble learning, and enhances the predictive performance of the ensemble model. The experimental results demonstrate that the DFD ensemble learning-based software defect prediction model outperforms the ten other models, including five common machine learning (ML) classification algorithms (logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM)), two deep learning (DL) algorithms (multi-layer perceptron (MLP) and convolutional neural network (CNN)), and three ensemble learning algorithms (random forest (RF), extreme gradient boosting (XGB), and stacking). The DFD model achieves superior performance on eight public NASA and PROMISE datasets (six of which are imbalanced) across five performance indicators, including area under the curve (AUC), geometric mean (G-Mean), F1 score, Matthews correlation coefficient (MCC), and Balance. Furthermore, the DFD method is not only highly performant but also requires a small number of base learners to converge rapidly, thereby reducing the amount of data and computation time required.
format	Article
id	doaj-art-596f7c5e94b942bc919b31684de0db98
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-596f7c5e94b942bc919b31684de0db982025-02-12T00:02:41ZengIEEEIEEE Access2169-35362025-01-0113255782559710.1109/ACCESS.2025.353765810869442A Novel Ensemble Classifier Selection Method for Software Defect PredictionXin Dong0https://orcid.org/0000-0001-9060-0103Jie Wang1Yan Liang2https://orcid.org/0000-0002-1617-457XComputer Engineering College, Chengdu Technological University, Chengdu, ChinaScience and Technology on Electronic Information Control Laboratory, Chengdu, ChinaComputer Engineering College, Chengdu Technological University, Chengdu, ChinaThe presence of software defects significantly impacts the quality of software systems and increases development and maintenance costs. To improve system quality and reduce costs, it is necessary to predict software defects in the early stages of the software development lifecycle. This paper proposes Double Fault Disagreement (DFD), a novel diversity metric and method for selecting competent base classifiers for ensemble learning-based software defect prediction. To consider the diversity features of the base learners, several base learners with strong diversity are chosen to build ensemble learning. This method makes full use of the diversity characteristics of base learners, leverages their classification ability, optimizes the selection method for ensemble learning, and enhances the predictive performance of the ensemble model. The experimental results demonstrate that the DFD ensemble learning-based software defect prediction model outperforms the ten other models, including five common machine learning (ML) classification algorithms (logistic regression (LR), naïve Bayes (NB), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM)), two deep learning (DL) algorithms (multi-layer perceptron (MLP) and convolutional neural network (CNN)), and three ensemble learning algorithms (random forest (RF), extreme gradient boosting (XGB), and stacking). The DFD model achieves superior performance on eight public NASA and PROMISE datasets (six of which are imbalanced) across five performance indicators, including area under the curve (AUC), geometric mean (G-Mean), F1 score, Matthews correlation coefficient (MCC), and Balance. Furthermore, the DFD method is not only highly performant but also requires a small number of base learners to converge rapidly, thereby reducing the amount of data and computation time required.https://ieeexplore.ieee.org/document/10869442/Diversity measuresdouble fault disagreementensemble classifier selectionimbalanced data classificationsoftware defect prediction
spellingShingle	Xin Dong Jie Wang Yan Liang A Novel Ensemble Classifier Selection Method for Software Defect Prediction IEEE Access Diversity measures double fault disagreement ensemble classifier selection imbalanced data classification software defect prediction
title	A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_full	A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_fullStr	A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_full_unstemmed	A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_short	A Novel Ensemble Classifier Selection Method for Software Defect Prediction
title_sort	novel ensemble classifier selection method for software defect prediction
topic	Diversity measures double fault disagreement ensemble classifier selection imbalanced data classification software defect prediction
url	https://ieeexplore.ieee.org/document/10869442/
work_keys_str_mv	AT xindong anovelensembleclassifierselectionmethodforsoftwaredefectprediction AT jiewang anovelensembleclassifierselectionmethodforsoftwaredefectprediction AT yanliang anovelensembleclassifierselectionmethodforsoftwaredefectprediction AT xindong novelensembleclassifierselectionmethodforsoftwaredefectprediction AT jiewang novelensembleclassifierselectionmethodforsoftwaredefectprediction AT yanliang novelensembleclassifierselectionmethodforsoftwaredefectprediction

A Novel Ensemble Classifier Selection Method for Software Defect Prediction

Similar Items