Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance
Machine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expe...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10870227/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823859623077085184 |
---|---|
author | Jose L. M. Perez Roberto S. M. Barros Silas G. T. C. Santos |
author_facet | Jose L. M. Perez Roberto S. M. Barros Silas G. T. C. Santos |
author_sort | Jose L. M. Perez |
collection | DOAJ |
description | Machine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expert intervention is a challenging task, especially in data stream learning with concept drifts, where data is generated rapidly, in real-time, with the possibility of changes in the data distribution. Concept drift occurs in supervised, semi-supervised and unsupervised learning environments, and is addressed through different approaches such as statistics, machine learning, among others. Currently, the use of drift detectors with base classifiers in semi-supervised learning is uncommon. Semi-supervised classifiers often consume a lot of memory and run-time, and the addition of a detection mechanism increases the computational cost. Furthermore, classification in semi-supervised environments can lead to problems related to labeling data to training: an error in this process can negatively impact model performance. This article investigates the use of supervised concept drift detectors in semi-supervised learning problems, highlighting how the detectors can improve the performance of classification. It also explores the influence of diversity in classifier ensembles, showing that increased diversity contributes to enhanced accuracy and robustness of models in concept drift scenarios. Additionally, it introduces a self-training approach to provide more labels and optimize model learning and adaptation. This research also details updates in the Massive Online Analysis (MOA) framework, that supports the simulation of semi-supervised scenarios. The experiments conducted to test the proposed approach used Hoeffding Tree (HT) and Naive Bayes (NB) as base classifiers, which were also employed as members of the ensembles used in this research. These classifiers were combined with several detectors and tested on a total of 84 artificial and five real-world datasets. The experiments were conducted with 15% and 30% of labeled data, the main percentages addressed in this research, while 100% was used to provide additional grounding in some cases. The results indicate that detectors created for supervised learning can be effectively used in semi-supervised environments. Furthermore, the tests using the introduced self-training approach demonstrates that the inclusion of additional labels significantly improves the performance of the classifiers. These findings may lead to a paradigm shift for future research, as many researchers have not considered concept drift detectors as a viable alternative due to the limited presence of labels in most real-world data streams. |
format | Article |
id | doaj-art-8a99da40447d425f8f8dcf219b2faad9 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-8a99da40447d425f8f8dcf219b2faad92025-02-11T00:01:35ZengIEEEIEEE Access2169-35362025-01-0113246812469710.1109/ACCESS.2025.353871010870227Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and PerformanceJose L. M. Perez0https://orcid.org/0009-0000-7914-0652Roberto S. M. Barros1https://orcid.org/0000-0002-3127-822XSilas G. T. C. Santos2https://orcid.org/0000-0002-9758-7543Centro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilCentro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilCentro de Informática, Universidade Federal de Pernambuco, Cidade Universitária, Recife, BrazilMachine learning algorithms that assist in decision-making are becoming crucial in several areas, such as healthcare, finance, marketing, etc. Algorithms exposed to a larger and more relevant amount of training data tend to perform better. However, the availability of labeled data without human expert intervention is a challenging task, especially in data stream learning with concept drifts, where data is generated rapidly, in real-time, with the possibility of changes in the data distribution. Concept drift occurs in supervised, semi-supervised and unsupervised learning environments, and is addressed through different approaches such as statistics, machine learning, among others. Currently, the use of drift detectors with base classifiers in semi-supervised learning is uncommon. Semi-supervised classifiers often consume a lot of memory and run-time, and the addition of a detection mechanism increases the computational cost. Furthermore, classification in semi-supervised environments can lead to problems related to labeling data to training: an error in this process can negatively impact model performance. This article investigates the use of supervised concept drift detectors in semi-supervised learning problems, highlighting how the detectors can improve the performance of classification. It also explores the influence of diversity in classifier ensembles, showing that increased diversity contributes to enhanced accuracy and robustness of models in concept drift scenarios. Additionally, it introduces a self-training approach to provide more labels and optimize model learning and adaptation. This research also details updates in the Massive Online Analysis (MOA) framework, that supports the simulation of semi-supervised scenarios. The experiments conducted to test the proposed approach used Hoeffding Tree (HT) and Naive Bayes (NB) as base classifiers, which were also employed as members of the ensembles used in this research. These classifiers were combined with several detectors and tested on a total of 84 artificial and five real-world datasets. The experiments were conducted with 15% and 30% of labeled data, the main percentages addressed in this research, while 100% was used to provide additional grounding in some cases. The results indicate that detectors created for supervised learning can be effectively used in semi-supervised environments. Furthermore, the tests using the introduced self-training approach demonstrates that the inclusion of additional labels significantly improves the performance of the classifiers. These findings may lead to a paradigm shift for future research, as many researchers have not considered concept drift detectors as a viable alternative due to the limited presence of labels in most real-world data streams.https://ieeexplore.ieee.org/document/10870227/Data streamconcept drift detectorssemi-supervised learningensemblesself-training |
spellingShingle | Jose L. M. Perez Roberto S. M. Barros Silas G. T. C. Santos Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance IEEE Access Data stream concept drift detectors semi-supervised learning ensembles self-training |
title | Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance |
title_full | Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance |
title_fullStr | Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance |
title_full_unstemmed | Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance |
title_short | Enhancing Semi-Supervised Learning With Concept Drift Detection and Self-Training: A Study on Classifier Diversity and Performance |
title_sort | enhancing semi supervised learning with concept drift detection and self training a study on classifier diversity and performance |
topic | Data stream concept drift detectors semi-supervised learning ensembles self-training |
url | https://ieeexplore.ieee.org/document/10870227/ |
work_keys_str_mv | AT joselmperez enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance AT robertosmbarros enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance AT silasgtcsantos enhancingsemisupervisedlearningwithconceptdriftdetectionandselftrainingastudyonclassifierdiversityandperformance |