Bilingual hate speech detection on social media: Amharic and Afaan Oromo

Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started using social media as a communication platform. Social media has grown to be one of the most significant components, with severa...

Full description

Saved in:
Bibliographic Details
Main Authors: Teshome Mulugeta Ababu, Michael Melese Woldeyohannis, Emuye Bawoke Getaneh
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-024-01044-y
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started using social media as a communication platform. Social media has grown to be one of the most significant components, with several benefits. However, technology also poses a number of threats, challenges, and barriers, such as hate speech, disinformation, and fake news. Hate speech detection is one of the many ways social media platforms can be accused of not doing enough to thwart hate speech on their platform. People in Bilingual and multinational societies commonly employ a code mix in both spoken and written communication. Among these, Amharic and Afaan Oromo language speakers frequently mix the two languages when conversing and posting on social media. The majority of previous study concentrated on identifying either technological favoured language or monolingual hate speech in Ethiopian languages; however, the availability of Bilingual communication in social media hampers the propagation of hate speech via social media. In this work, a Bilingual hate speech detection for Amharic and Afaan Oromo languages were conducted using four different deep learning classifiers (CNN, BiLSTM, CNN-BiLSTM, and BiGRU) and three feature extraction (Keras word embedding, word2vec, and FastText) techniques. According to the experiment, BiLSTM with FastText feature extraction is an outperforming the other algorithm by achieving a 78.05% accuracy for Bilingual Amharic Afaan Oromo hate speech detection. The FastText feature extraction overcomes the problem of out of vocabulary (OOV). Furthermore, we are working towards including others linguistic features of the languages to detect hate speech and make the resource available to facilitate further research in the area of Bilingual hate speech detection for other under-resourced Ethiopian languages.
ISSN:2196-1115