Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique

Presently, the exponential growth of unstructured data on the web and social networks has made it increasingly challenging for individuals to retrieve relevant information efficiently. Over the years, various text summarization techniques have been developed to address this issue. However, tradition...

Full description

Saved in:
Bibliographic Details
Main Authors: Jyotirmayee Rautaray, Sangram Panigrahi, Ajit Kumar Nayak, Premananda Sahu, Kaushik Mishra
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10870163/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Presently, the exponential growth of unstructured data on the web and social networks has made it increasingly challenging for individuals to retrieve relevant information efficiently. Over the years, various text summarization techniques have been developed to address this issue. However, traditional approaches that rely on directly extracting words often lead to redundancies and fail to establish a strong connection between the summary and the original document. This paper presents a novel Deep Learning (DL)-based text summarization approach incorporating the following phases: pre-processing, feature extraction, vectorization, and summarization using a hybrid Cat Swarm Optimization (CSO) and Harris Hawk Optimization (HHO) algorithm. Initially, input documents undergo pre-processing steps, including sentence segmentation, word tokenization, stop word removal, and lemmatization, to enhance text quality. Features are then extracted using a Restricted Boltzmann Machine (RBM) to obtain nine key attributes. Vectorization is performed using Term Frequency-Inverse Document Frequency (TF-IDF) to represent sentences in vector form. The hybrid CSO-HHO algorithm is subsequently applied to generate summaries. The proposed method’s efficiency was evaluated using datasets from the Document Understanding Conference (DUC), specifically DUC-2002, DUC-2003, and DUC-2005. Metrics such as sensitivity, readability, coherence, precision, BLEU score, ROUGE score, and F-score were analyzed to assess performance. The proposed approach’s results were compared with existing methods, including CSO, QABC, PSO, GJO, FF, and machine learning techniques like SVM and RF. The hybrid CSO-HHO algorithm achieved an accuracy of 99.56%, demonstrating its superiority in text summarization tasks.
ISSN:2169-3536