Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports

Education researchers have long had to choose between studies that provide rich insight into teaching and learning in a particular context and insight into broad patterns revealed from large-scale studies. The advances in natural language processing models potentially generate research that offers...

Full description

Saved in:

Bibliographic Details
Main Authors:	Thato Senoamadi, Lee Rusznyak, Ritesh Ajoodha
Format:	Article
Language:	English
Published:	Research and Postgraduate Support Directorate 2024-12-01
Series:	African Journal of Inter-Multidisciplinary Studies
Subjects:	natural language processing bidirectional encoder representations from transformers legitimation code theory semantic density teacher education
Online Access:	https://journals.uct.ac.za/new_dut/index.php/ajims/article/view/1558
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823864562468782080
author	Thato Senoamadi Lee Rusznyak Ritesh Ajoodha
author_facet	Thato Senoamadi Lee Rusznyak Ritesh Ajoodha
author_sort	Thato Senoamadi
collection	DOAJ
description	Education researchers have long had to choose between studies that provide rich insight into teaching and learning in a particular context and insight into broad patterns revealed from large-scale studies. The advances in natural language processing models potentially generate research that offers detailed analysis of specific cases and reveals broader patterns in a much larger dataset. This paper reports on the findings of a study that tested the accuracy of advanced natural language processing models to assign labels to a qualitative dataset. The dataset for this analysis comes from lesson observation reports written by a cohort of preservice teachers pursuing a Postgraduate Certificate in Education (PGCE). Their responses were manually analysed using Legitimation Code Theory (LCT) and graded from simple descriptive observations to complex ones that suggested an interpretation of teachers’ pedagogic actions. The Bidirectional Encoder Representations from Transformers (BERT) and its derivatives, namely DistilBERT and RoBERTa, were trained to recognise coding decisions made by researchers on a subset of empirical data. This study compares the efficacy of BERT models in assigning appropriate labels to sections of the dataset by comparing its assigned labels to those allocated manually by the research team. Built upon a dataset consisting of 2167 manually annotated sections, the natural language processing models were trained, refined, and tested in labelling the dataset. A comparative analysis of BERT, DistilBERT, and RoBERTa offers insights into their strengths, efficiencies, and adaptability, achieving an accuracy rate between 72% and 78%. The metrics reveal the current efficacy of these models in coding semantic density in lesson observation reports and create possibilities for analysing massive datasets of similar text. The challenges experienced also reveal the potential limitations of this approach.
format	Article
id	doaj-art-485588c16e124a1e987c34f8e022a850
institution	Kabale University
issn	2663-4597 2663-4589
language	English
publishDate	2024-12-01
publisher	Research and Postgraduate Support Directorate
record_format	Article
series	African Journal of Inter-Multidisciplinary Studies
spelling	doaj-art-485588c16e124a1e987c34f8e022a8502025-02-08T20:55:07ZengResearch and Postgraduate Support DirectorateAfrican Journal of Inter-Multidisciplinary Studies2663-45972663-45892024-12-016110.51415/ajims.v6i1.1558Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports Thato Senoamadi0Lee Rusznyak1Ritesh Ajoodha2University of the Witwatersrand, South AfricaUniversity of Witwatersrand, South AfricaUniversity of the Witwatersrand, South Africa Education researchers have long had to choose between studies that provide rich insight into teaching and learning in a particular context and insight into broad patterns revealed from large-scale studies. The advances in natural language processing models potentially generate research that offers detailed analysis of specific cases and reveals broader patterns in a much larger dataset. This paper reports on the findings of a study that tested the accuracy of advanced natural language processing models to assign labels to a qualitative dataset. The dataset for this analysis comes from lesson observation reports written by a cohort of preservice teachers pursuing a Postgraduate Certificate in Education (PGCE). Their responses were manually analysed using Legitimation Code Theory (LCT) and graded from simple descriptive observations to complex ones that suggested an interpretation of teachers’ pedagogic actions. The Bidirectional Encoder Representations from Transformers (BERT) and its derivatives, namely DistilBERT and RoBERTa, were trained to recognise coding decisions made by researchers on a subset of empirical data. This study compares the efficacy of BERT models in assigning appropriate labels to sections of the dataset by comparing its assigned labels to those allocated manually by the research team. Built upon a dataset consisting of 2167 manually annotated sections, the natural language processing models were trained, refined, and tested in labelling the dataset. A comparative analysis of BERT, DistilBERT, and RoBERTa offers insights into their strengths, efficiencies, and adaptability, achieving an accuracy rate between 72% and 78%. The metrics reveal the current efficacy of these models in coding semantic density in lesson observation reports and create possibilities for analysing massive datasets of similar text. The challenges experienced also reveal the potential limitations of this approach. https://journals.uct.ac.za/new_dut/index.php/ajims/article/view/1558natural language processingbidirectional encoder representations from transformerslegitimation code theorysemantic densityteacher education
spellingShingle	Thato Senoamadi Lee Rusznyak Ritesh Ajoodha Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports African Journal of Inter-Multidisciplinary Studies natural language processing bidirectional encoder representations from transformers legitimation code theory semantic density teacher education
title	Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports
title_full	Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports
title_fullStr	Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports
title_full_unstemmed	Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports
title_short	Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports
title_sort	using natural language processing models to automate text labelling categorising semantic density in preservice teachers lesson observation reports
topic	natural language processing bidirectional encoder representations from transformers legitimation code theory semantic density teacher education
url	https://journals.uct.ac.za/new_dut/index.php/ajims/article/view/1558
work_keys_str_mv	AT thatosenoamadi usingnaturallanguageprocessingmodelstoautomatetextlabellingcategorisingsemanticdensityinpreserviceteacherslessonobservationreports AT leerusznyak usingnaturallanguageprocessingmodelstoautomatetextlabellingcategorisingsemanticdensityinpreserviceteacherslessonobservationreports AT riteshajoodha usingnaturallanguageprocessingmodelstoautomatetextlabellingcategorisingsemanticdensityinpreserviceteacherslessonobservationreports

Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports

Similar Items