Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance

Machine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expe...

Full description

Saved in:
Bibliographic Details
Main Authors: Jose L. M. Perez, Roberto S. M. Barros, Silas G. T. C. Santos
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10870227/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823859623077085184
author Jose L. M. Perez
Roberto S. M. Barros
Silas G. T. C. Santos
author_facet Jose L. M. Perez
Roberto S. M. Barros
Silas G. T. C. Santos
author_sort Jose L. M. Perez
collection DOAJ
description Machine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expert intervention is a challenging task, especially in data stream learning with concept drifts, where data is generated rapidly, in real-time, with the possibility of changes in the data distribution. Concept drift occurs in supervised, semi-supervised and unsupervised learning environments, and is addressed through different approaches such as statistics, machine learning, among others. Currently, the use of drift detectors with base classifiers in semi-supervised learning is uncommon. Semi-supervised classifiers often consume a lot of memory and run-time, and the addition of a detection mechanism increases the computational cost. Furthermore, classification in semi-supervised environments can lead to problems related to labeling data to training: an error in this process can negatively impact model performance. This article investigates the use of supervised concept drift detectors in semi-supervised learning problems, highlighting how the detectors can improve the performance of classification. It also explores the influence of diversity in classifier ensembles, showing that increased diversity contributes to enhanced accuracy and robustness of models in concept drift scenarios. Additionally, it introduces a self-training approach to provide more labels and optimize model learning and adaptation. This research also details updates in the Massive Online Analysis (MOA) framework, that supports the simulation of semi-supervised scenarios. The experiments conducted to test the proposed approach used Hoeffding Tree (HT) and Naive Bayes (NB) as base classifiers, which were also employed as members of the ensembles used in this research. These classifiers were combined with several detectors and tested on a total of 84 artificial and five real-world datasets. The experiments were conducted with 15% and 30% of labeled data, the main percentages addressed in this research, while 100% was used to provide additional grounding in some cases. The results indicate that detectors created for supervised learning can be effectively used in semi-supervised environments. Furthermore, the tests using the introduced self-training approach demonstrates that the inclusion of additional labels significantly improves the performance of the classifiers. These findings may lead to a paradigm shift for future research, as many researchers have not considered concept drift detectors as a viable alternative due to the limited presence of labels in most real-world data streams.
format Article
id doaj-art-8a99da40447d425f8f8dcf219b2faad9
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-8a99da40447d425f8f8dcf219b2faad92025-02-11T00:01:35ZengIEEEIEEE Access2169-35362025-01-0113246812469710.1109/ACCESS.2025.353871010870227Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and PerformanceJose L. M. Perez0https://orcid.org/0009-0000-7914-0652Roberto S. M. Barros1https://orcid.org/0000-0002-3127-822XSilas G. T. C. Santos2https://orcid.org/0000-0002-9758-7543Centro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilCentro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilCentro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilMachine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expert intervention is a challenging task, especially in data stream learning with concept drifts, where data is generated rapidly, in real-time, with the possibility of changes in the data distribution. Concept drift occurs in supervised, semi-supervised and unsupervised learning environments, and is addressed through different approaches such as statistics, machine learning, among others. Currently, the use of drift detectors with base classifiers in semi-supervised learning is uncommon. Semi-supervised classifiers often consume a lot of memory and run-time, and the addition of a detection mechanism increases the computational cost. Furthermore, classification in semi-supervised environments can lead to problems related to labeling data to training: an error in this process can negatively impact model performance. This article investigates the use of supervised concept drift detectors in semi-supervised learning problems, highlighting how the detectors can improve the performance of classification. It also explores the influence of diversity in classifier ensembles, showing that increased diversity contributes to enhanced accuracy and robustness of models in concept drift scenarios. Additionally, it introduces a self-training approach to provide more labels and optimize model learning and adaptation. This research also details updates in the Massive Online Analysis (MOA) framework, that supports the simulation of semi-supervised scenarios. The experiments conducted to test the proposed approach used Hoeffding Tree (HT) and Naive Bayes (NB) as base classifiers, which were also employed as members of the ensembles used in this research. These classifiers were combined with several detectors and tested on a total of 84 artificial and five real-world datasets. The experiments were conducted with 15% and 30% of labeled data, the main percentages addressed in this research, while 100% was used to provide additional grounding in some cases. The results indicate that detectors created for supervised learning can be effectively used in semi-supervised environments. Furthermore, the tests using the introduced self-training approach demonstrates that the inclusion of additional labels significantly improves the performance of the classifiers. These findings may lead to a paradigm shift for future research, as many researchers have not considered concept drift detectors as a viable alternative due to the limited presence of labels in most real-world data streams.https://ieeexplore.ieee.org/document/10870227/Data streamconcept drift detectorssemi-supervised learningensemblesself-training
spellingShingle Jose L. M. Perez
Roberto S. M. Barros
Silas G. T. C. Santos
Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
IEEE Access
Data stream
concept drift detectors
semi-supervised learning
ensembles
self-training
title Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_full Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_fullStr Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_full_unstemmed Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_short Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
title_sort enhancing semi supervised learning with concept drift detection and self training a study on classifier diversity and performance
topic Data stream
concept drift detectors
semi-supervised learning
ensembles
self-training
url https://ieeexplore.ieee.org/document/10870227/
work_keys_str_mv AT joselmperez enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance
AT robertosmbarros enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance
AT silasgtcsantos enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance