Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpusZenodo

The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'�...

Full description

Saved in:
Bibliographic Details
Main Authors: Khabibulla Madatov, Sapura Sattarova, Jernej Vičič
Format: Article
Language:English
Published: Elsevier 2025-04-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925000812
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'' (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the ``Uzbek Primary School Corpus'' (UPSC) by authors. Using the ``Comparative Lemma Extraction Method'' (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
ISSN:2352-3409