Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo

The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'�...

Full description

Saved in:

Bibliographic Details
Main Authors:	Khabibulla Madatov, Sapura Sattarova, Jernej Vičič
Format:	Article
Language:	English
Published:	Elsevier 2025-04-01
Series:	Data in Brief
Subjects:	Lemma Uzbek language Primary school Corpus construction Natural language Processing (NLP) Comparative lemma extraction method
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340925000812
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1825199398307495936
author	Khabibulla Madatov Sapura Sattarova Jernej Vičič
author_facet	Khabibulla Madatov Sapura Sattarova Jernej Vičič
author_sort	Khabibulla Madatov
collection	DOAJ
description	The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'' (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the ``Uzbek Primary School Corpus'' (UPSC) by authors. Using the ``Comparative Lemma Extraction Method'' (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
format	Article
id	doaj-art-21969985162946f0b84cc39c675d5c2b
institution	Kabale University
issn	2352-3409
language	English
publishDate	2025-04-01
publisher	Elsevier
record_format	Article
series	Data in Brief
spelling	doaj-art-21969985162946f0b84cc39c675d5c2b2025-02-08T05:00:35ZengElsevierData in Brief2352-34092025-04-0159111349Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodoKhabibulla Madatov0Sapura Sattarova1Jernej Vičič2Urgench State University, 14, Kh. Alimdjan str, Urgench city, 220100, UzbekistanUrgench State University, 14, Kh. Alimdjan str, Urgench city, 220100, UzbekistanUniversity of Primorska, FAMNIT, Glagoljaska 8, 6000 Koper, Slovenia; Research Centre of the Slovenian Academy of Sciences and Arts, The Fran Ramovš Institute, Novi trg 2, 1000 Ljubljana, Slovenija; Corresponding author.The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'' (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the ``Uzbek Primary School Corpus'' (UPSC) by authors. Using the ``Comparative Lemma Extraction Method'' (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.http://www.sciencedirect.com/science/article/pii/S2352340925000812LemmaUzbek languagePrimary schoolCorpus constructionNatural language Processing (NLP)Comparative lemma extraction method
spellingShingle	Khabibulla Madatov Sapura Sattarova Jernej Vičič Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo Data in Brief Lemma Uzbek language Primary school Corpus construction Natural language Processing (NLP) Comparative lemma extraction method
title	Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo
title_full	Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo
title_fullStr	Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo
title_full_unstemmed	Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo
title_short	Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo
title_sort	dataset of vocabulary in uzbek primary education extraction and analysis in case of the school corpuszenodo
topic	Lemma Uzbek language Primary school Corpus construction Natural language Processing (NLP) Comparative lemma extraction method
url	http://www.sciencedirect.com/science/article/pii/S2352340925000812
work_keys_str_mv	AT khabibullamadatov datasetofvocabularyinuzbekprimaryeducationextractionandanalysisincaseoftheschoolcorpuszenodo AT sapurasattarova datasetofvocabularyinuzbekprimaryeducationextractionandanalysisincaseoftheschoolcorpuszenodo AT jernejvicic datasetofvocabularyinuzbekprimaryeducationextractionandanalysisincaseoftheschoolcorpuszenodo

Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo

Similar Items