Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning Model

With the increase in the volume of news articles and headlines being generated, it is becoming more difficult for individuals to keep up with the latest developments and find relevant news articles in the Kurdish language. To address this issue, this paper proposes a novel data augmentation approach...

Full description

Saved in:
Bibliographic Details
Main Author: Soran Badawi
Format: Article
Language:English
Published: Sulaimani Polytechnic University 2023-06-01
Series:Kurdistan Journal of Applied Research
Subjects:
Online Access:https://kjar.spu.edu.iq/index.php/kjar/article/view/852
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861371235729408
author Soran Badawi
author_facet Soran Badawi
author_sort Soran Badawi
collection DOAJ
description With the increase in the volume of news articles and headlines being generated, it is becoming more difficult for individuals to keep up with the latest developments and find relevant news articles in the Kurdish language. To address this issue, this paper proposes a novel data augmentation approach for improving the performance of Kurdish news headline classification using back-translation and a proposed deep learning Bidirectional Long Short-Term Memory (BiLSTM) model. The approach involves generating synthetic training data by translating Kurdish headlines into a target language in this context English language and back-translating them to the Kurdish language, resulting in an augmented dataset. The proposed BiLSTM model is trained on the augmented data and compared with baseline models SVM (Support-Vector-Machines) and Naïve Bayes an trained on the original data. The experimental results demonstrate that the proposed BiLSTM model outperforms the baseline model and other existing models, achieving state-of-the-art performance on the Kurdish news headline classification task. The findings suggest that the combination of back-translation and a proposed BiLSTM model is a promising approach for data augmentation in low-resource languages, contributing to the advancement of natural language processing in under-resourced languages. Moreover, having a Kurdish news headline classification model can improve access to news and information for Kurdish speakers. With the classification model, they can easily and quickly search for news articles that interest them based on their preferred categories, such as politics, sports, or entertainment.
format Article
id doaj-art-d97066db684d4d40a3f9d27ab2fed6cd
institution Kabale University
issn 2411-7684
2411-7706
language English
publishDate 2023-06-01
publisher Sulaimani Polytechnic University
record_format Article
series Kurdistan Journal of Applied Research
spelling doaj-art-d97066db684d4d40a3f9d27ab2fed6cd2025-02-09T20:59:36ZengSulaimani Polytechnic UniversityKurdistan Journal of Applied Research2411-76842411-77062023-06-018110.24017/science/2023.1.4Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning ModelSoran Badawi0Language Center, Charmo University, KRG, Chamchamal, Kurdistan, IraqWith the increase in the volume of news articles and headlines being generated, it is becoming more difficult for individuals to keep up with the latest developments and find relevant news articles in the Kurdish language. To address this issue, this paper proposes a novel data augmentation approach for improving the performance of Kurdish news headline classification using back-translation and a proposed deep learning Bidirectional Long Short-Term Memory (BiLSTM) model. The approach involves generating synthetic training data by translating Kurdish headlines into a target language in this context English language and back-translating them to the Kurdish language, resulting in an augmented dataset. The proposed BiLSTM model is trained on the augmented data and compared with baseline models SVM (Support-Vector-Machines) and Naïve Bayes an trained on the original data. The experimental results demonstrate that the proposed BiLSTM model outperforms the baseline model and other existing models, achieving state-of-the-art performance on the Kurdish news headline classification task. The findings suggest that the combination of back-translation and a proposed BiLSTM model is a promising approach for data augmentation in low-resource languages, contributing to the advancement of natural language processing in under-resourced languages. Moreover, having a Kurdish news headline classification model can improve access to news and information for Kurdish speakers. With the classification model, they can easily and quickly search for news articles that interest them based on their preferred categories, such as politics, sports, or entertainment. https://kjar.spu.edu.iq/index.php/kjar/article/view/852Data AugmentationDeep LearningText ClassificationMachine LearningKurdish Language
spellingShingle Soran Badawi
Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning Model
Kurdistan Journal of Applied Research
Data Augmentation
Deep Learning
Text Classification
Machine Learning
Kurdish Language
title Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning Model
title_full Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning Model
title_fullStr Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning Model
title_full_unstemmed Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning Model
title_short Data Augmentation For Sorani Kurdish News Headline Classification Using Back-Translation And Deep Learning Model
title_sort data augmentation for sorani kurdish news headline classification using back translation and deep learning model
topic Data Augmentation
Deep Learning
Text Classification
Machine Learning
Kurdish Language
url https://kjar.spu.edu.iq/index.php/kjar/article/view/852
work_keys_str_mv AT soranbadawi dataaugmentationforsoranikurdishnewsheadlineclassificationusingbacktranslationanddeeplearningmodel