Headline-Guided Extractive Summarization for Thai News Articles

Text summarization is a process of condensing lengthy texts while preserving their essential information. Previous studies have predominantly focused on high-resource languages, while low-resource languages like Thai have received less attention. Furthermore, earlier extractive summarization models...

Full description

Saved in:
Bibliographic Details
Main Authors: Pimpitchaya Kositcharoensuk, Nakarin Sritrakool, Ploy N. Pratanwanich
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10870350/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823859598096859136
author Pimpitchaya Kositcharoensuk
Nakarin Sritrakool
Ploy N. Pratanwanich
author_facet Pimpitchaya Kositcharoensuk
Nakarin Sritrakool
Ploy N. Pratanwanich
author_sort Pimpitchaya Kositcharoensuk
collection DOAJ
description Text summarization is a process of condensing lengthy texts while preserving their essential information. Previous studies have predominantly focused on high-resource languages, while low-resource languages like Thai have received less attention. Furthermore, earlier extractive summarization models for Thai texts have primarily relied on the article’s body, without considering the headline. This omission can result in the exclusion of key sentences from the summary. To address these limitations, we propose CHIMA, an extractive summarization model that incorporates the contextual information of the headline for Thai news articles. Our model utilizes a pre-trained language model to capture complex language semantics and assigns a probability to each sentence to be included in the summary. By leveraging the headline to guide sentence selection, CHIMA enhances the model’s ability to recover important sentences and discount irrelevant ones. Additionally, we introduce two strategies for aggregating headline-body similarities, simple average and harmonic mean, providing flexibility in sentence selection to accommodate varying writing styles. Experiments on publicly available Thai news datasets demonstrate that CHIMA outperforms baseline models across ROUGE, BLEU, and F1 scores. These results highlight the effectiveness of incorporating the headline-body similarities as model guidance. The results also indicate an enhancement in the model’s ability to recall critical sentences, even those scattered throughout the middle or end of the article. With this potential, headline-guided extractive summarization offers a promising approach to improve the quality and relevance of summaries for Thai news articles.
format Article
id doaj-art-2b8d2bd4a0a64cc8808ac49fcb00e247
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-2b8d2bd4a0a64cc8808ac49fcb00e2472025-02-11T00:01:05ZengIEEEIEEE Access2169-35362025-01-0113243682438210.1109/ACCESS.2025.353832910870350Headline-Guided Extractive Summarization for Thai News ArticlesPimpitchaya Kositcharoensuk0https://orcid.org/0009-0002-7166-1001Nakarin Sritrakool1Ploy N. Pratanwanich2https://orcid.org/0000-0003-3684-6967Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, ThailandNational Institute of Informatics, Tokyo, JapanDepartment of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, ThailandText summarization is a process of condensing lengthy texts while preserving their essential information. Previous studies have predominantly focused on high-resource languages, while low-resource languages like Thai have received less attention. Furthermore, earlier extractive summarization models for Thai texts have primarily relied on the article’s body, without considering the headline. This omission can result in the exclusion of key sentences from the summary. To address these limitations, we propose CHIMA, an extractive summarization model that incorporates the contextual information of the headline for Thai news articles. Our model utilizes a pre-trained language model to capture complex language semantics and assigns a probability to each sentence to be included in the summary. By leveraging the headline to guide sentence selection, CHIMA enhances the model’s ability to recover important sentences and discount irrelevant ones. Additionally, we introduce two strategies for aggregating headline-body similarities, simple average and harmonic mean, providing flexibility in sentence selection to accommodate varying writing styles. Experiments on publicly available Thai news datasets demonstrate that CHIMA outperforms baseline models across ROUGE, BLEU, and F1 scores. These results highlight the effectiveness of incorporating the headline-body similarities as model guidance. The results also indicate an enhancement in the model’s ability to recall critical sentences, even those scattered throughout the middle or end of the article. With this potential, headline-guided extractive summarization offers a promising approach to improve the quality and relevance of summaries for Thai news articles.https://ieeexplore.ieee.org/document/10870350/Document analysisextractive text summarizationinformation retrievalnatural language processingnatural language understandingpattern recognition
spellingShingle Pimpitchaya Kositcharoensuk
Nakarin Sritrakool
Ploy N. Pratanwanich
Headline-Guided Extractive Summarization for Thai News Articles
IEEE Access
Document analysis
extractive text summarization
information retrieval
natural language processing
natural language understanding
pattern recognition
title Headline-Guided Extractive Summarization for Thai News Articles
title_full Headline-Guided Extractive Summarization for Thai News Articles
title_fullStr Headline-Guided Extractive Summarization for Thai News Articles
title_full_unstemmed Headline-Guided Extractive Summarization for Thai News Articles
title_short Headline-Guided Extractive Summarization for Thai News Articles
title_sort headline guided extractive summarization for thai news articles
topic Document analysis
extractive text summarization
information retrieval
natural language processing
natural language understanding
pattern recognition
url https://ieeexplore.ieee.org/document/10870350/
work_keys_str_mv AT pimpitchayakositcharoensuk headlineguidedextractivesummarizationforthainewsarticles
AT nakarinsritrakool headlineguidedextractivesummarizationforthainewsarticles
AT ploynpratanwanich headlineguidedextractivesummarizationforthainewsarticles