Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance

Machine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jose L. M. Perez, Roberto S. M. Barros, Silas G. T. C. Santos
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Data stream concept drift detectors semi-supervised learning ensembles self-training
Online Access:	https://ieeexplore.ieee.org/document/10870227/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823859623077085184
author	Jose L. M. Perez Roberto S. M. Barros Silas G. T. C. Santos
author_facet	Jose L. M. Perez Roberto S. M. Barros Silas G. T. C. Santos
author_sort	Jose L. M. Perez
collection	DOAJ
description	Machine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expert intervention is a challenging task, especially in data stream learning with concept drifts, where data is generated rapidly, in real-time, with the possibility of changes in the data distribution. Concept drift occurs in supervised, semi-supervised and unsupervised learning environments, and is addressed through different approaches such as statistics, machine learning, among others. Currently, the use of drift detectors with base classifiers in semi-supervised learning is uncommon. Semi-supervised classifiers often consume a lot of memory and run-time, and the addition of a detection mechanism increases the computational cost. Furthermore, classification in semi-supervised environments can lead to problems related to labeling data to training: an error in this process can negatively impact model performance. This article investigates the use of supervised concept drift detectors in semi-supervised learning problems, highlighting how the detectors can improve the performance of classification. It also explores the influence of diversity in classifier ensembles, showing that increased diversity contributes to enhanced accuracy and robustness of models in concept drift scenarios. Additionally, it introduces a self-training approach to provide more labels and optimize model learning and adaptation. This research also details updates in the Massive Online Analysis (MOA) framework, that supports the simulation of semi-supervised scenarios. The experiments conducted to test the proposed approach used Hoeffding Tree (HT) and Naive Bayes (NB) as base classifiers, which were also employed as members of the ensembles used in this research. These classifiers were combined with several detectors and tested on a total of 84 artificial and five real-world datasets. The experiments were conducted with 15% and 30% of labeled data, the main percentages addressed in this research, while 100% was used to provide additional grounding in some cases. The results indicate that detectors created for supervised learning can be effectively used in semi-supervised environments. Furthermore, the tests using the introduced self-training approach demonstrates that the inclusion of additional labels significantly improves the performance of the classifiers. These findings may lead to a paradigm shift for future research, as many researchers have not considered concept drift detectors as a viable alternative due to the limited presence of labels in most real-world data streams.
format	Article
id	doaj-art-8a99da40447d425f8f8dcf219b2faad9
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-8a99da40447d425f8f8dcf219b2faad92025-02-11T00:01:35ZengIEEEIEEE Access2169-35362025-01-0113246812469710.1109/ACCESS.2025.353871010870227Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and PerformanceJose L. M. Perez0https://orcid.org/0009-0000-7914-0652Roberto S. M. Barros1https://orcid.org/0000-0002-3127-822XSilas G. T. C. Santos2https://orcid.org/0000-0002-9758-7543Centro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilCentro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilCentro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilMachine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expert intervention is a challenging task, especially in data stream learning with concept drifts, where data is generated rapidly, in real-time, with the possibility of changes in the data distribution. Concept drift occurs in supervised, semi-supervised and unsupervised learning environments, and is addressed through different approaches such as statistics, machine learning, among others. Currently, the use of drift detectors with base classifiers in semi-supervised learning is uncommon. Semi-supervised classifiers often consume a lot of memory and run-time, and the addition of a detection mechanism increases the computational cost. Furthermore, classification in semi-supervised environments can lead to problems related to labeling data to training: an error in this process can negatively impact model performance. This article investigates the use of supervised concept drift detectors in semi-supervised learning problems, highlighting how the detectors can improve the performance of classification. It also explores the influence of diversity in classifier ensembles, showing that increased diversity contributes to enhanced accuracy and robustness of models in concept drift scenarios. Additionally, it introduces a self-training approach to provide more labels and optimize model learning and adaptation. This research also details updates in the Massive Online Analysis (MOA) framework, that supports the simulation of semi-supervised scenarios. The experiments conducted to test the proposed approach used Hoeffding Tree (HT) and Naive Bayes (NB) as base classifiers, which were also employed as members of the ensembles used in this research. These classifiers were combined with several detectors and tested on a total of 84 artificial and five real-world datasets. The experiments were conducted with 15% and 30% of labeled data, the main percentages addressed in this research, while 100% was used to provide additional grounding in some cases. The results indicate that detectors created for supervised learning can be effectively used in semi-supervised environments. Furthermore, the tests using the introduced self-training approach demonstrates that the inclusion of additional labels significantly improves the performance of the classifiers. These findings may lead to a paradigm shift for future research, as many researchers have not considered concept drift detectors as a viable alternative due to the limited presence of labels in most real-world data streams.https://ieeexplore.ieee.org/document/10870227/Data streamconcept drift detectorssemi-supervised learningensemblesself-training
spellingShingle	Jose L. M. Perez Roberto S. M. Barros Silas G. T. C. Santos Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance IEEE Access Data stream concept drift detectors semi-supervised learning ensembles self-training
title	Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_full	Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_fullStr	Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_full_unstemmed	Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_short	Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_sort	enhancing semi supervised learning with concept drift detection and self training a study on classifier diversity and performance
topic	Data stream concept drift detectors semi-supervised learning ensembles self-training
url	https://ieeexplore.ieee.org/document/10870227/
work_keys_str_mv	AT joselmperez enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance AT robertosmbarros enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance AT silasgtcsantos enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance

Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance

Similar Items