Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo

The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'�...

Full description

Saved in:

Bibliographic Details
Main Authors:	Khabibulla Madatov, Sapura Sattarova, Jernej Vičič
Format:	Article
Language:	English
Published:	Elsevier 2025-04-01
Series:	Data in Brief
Subjects:	Lemma Uzbek language Primary school Corpus construction Natural language Processing (NLP) Comparative lemma extraction method
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340925000812
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'' (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the ``Uzbek Primary School Corpus'' (UPSC) by authors. Using the ``Comparative Lemma Extraction Method'' (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
ISSN:	2352-3409

Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo

Similar Items