Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique
Presently, the exponential growth of unstructured data on the web and social networks has made it increasingly challenging for individuals to retrieve relevant information efficiently. Over the years, various text summarization techniques have been developed to address this issue. However, tradition...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10870163/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Presently, the exponential growth of unstructured data on the web and social networks has made it increasingly challenging for individuals to retrieve relevant information efficiently. Over the years, various text summarization techniques have been developed to address this issue. However, traditional approaches that rely on directly extracting words often lead to redundancies and fail to establish a strong connection between the summary and the original document. This paper presents a novel Deep Learning (DL)-based text summarization approach incorporating the following phases: pre-processing, feature extraction, vectorization, and summarization using a hybrid Cat Swarm Optimization (CSO) and Harris Hawk Optimization (HHO) algorithm. Initially, input documents undergo pre-processing steps, including sentence segmentation, word tokenization, stop word removal, and lemmatization, to enhance text quality. Features are then extracted using a Restricted Boltzmann Machine (RBM) to obtain nine key attributes. Vectorization is performed using Term Frequency-Inverse Document Frequency (TF-IDF) to represent sentences in vector form. The hybrid CSO-HHO algorithm is subsequently applied to generate summaries. The proposed method’s efficiency was evaluated using datasets from the Document Understanding Conference (DUC), specifically DUC-2002, DUC-2003, and DUC-2005. Metrics such as sensitivity, readability, coherence, precision, BLEU score, ROUGE score, and F-score were analyzed to assess performance. The proposed approach’s results were compared with existing methods, including CSO, QABC, PSO, GJO, FF, and machine learning techniques like SVM and RF. The hybrid CSO-HHO algorithm achieved an accuracy of 99.56%, demonstrating its superiority in text summarization tasks. |
---|---|
ISSN: | 2169-3536 |