Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique

Presently, the exponential growth of unstructured data on the web and social networks has made it increasingly challenging for individuals to retrieve relevant information efficiently. Over the years, various text summarization techniques have been developed to address this issue. However, tradition...

Full description

Saved in:
Bibliographic Details
Main Authors: Jyotirmayee Rautaray, Sangram Panigrahi, Ajit Kumar Nayak, Premananda Sahu, Kaushik Mishra
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10870163/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823859641686163456
author Jyotirmayee Rautaray
Sangram Panigrahi
Ajit Kumar Nayak
Premananda Sahu
Kaushik Mishra
author_facet Jyotirmayee Rautaray
Sangram Panigrahi
Ajit Kumar Nayak
Premananda Sahu
Kaushik Mishra
author_sort Jyotirmayee Rautaray
collection DOAJ
description Presently, the exponential growth of unstructured data on the web and social networks has made it increasingly challenging for individuals to retrieve relevant information efficiently. Over the years, various text summarization techniques have been developed to address this issue. However, traditional approaches that rely on directly extracting words often lead to redundancies and fail to establish a strong connection between the summary and the original document. This paper presents a novel Deep Learning (DL)-based text summarization approach incorporating the following phases: pre-processing, feature extraction, vectorization, and summarization using a hybrid Cat Swarm Optimization (CSO) and Harris Hawk Optimization (HHO) algorithm. Initially, input documents undergo pre-processing steps, including sentence segmentation, word tokenization, stop word removal, and lemmatization, to enhance text quality. Features are then extracted using a Restricted Boltzmann Machine (RBM) to obtain nine key attributes. Vectorization is performed using Term Frequency-Inverse Document Frequency (TF-IDF) to represent sentences in vector form. The hybrid CSO-HHO algorithm is subsequently applied to generate summaries. The proposed method’s efficiency was evaluated using datasets from the Document Understanding Conference (DUC), specifically DUC-2002, DUC-2003, and DUC-2005. Metrics such as sensitivity, readability, coherence, precision, BLEU score, ROUGE score, and F-score were analyzed to assess performance. The proposed approach’s results were compared with existing methods, including CSO, QABC, PSO, GJO, FF, and machine learning techniques like SVM and RF. The hybrid CSO-HHO algorithm achieved an accuracy of 99.56%, demonstrating its superiority in text summarization tasks.
format Article
id doaj-art-c86c49aae2e140569b8cad60f471685e
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c86c49aae2e140569b8cad60f471685e2025-02-11T00:01:19ZengIEEEIEEE Access2169-35362025-01-0113245152452910.1109/ACCESS.2025.353816910870163Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization TechniqueJyotirmayee Rautaray0https://orcid.org/0000-0003-2747-3919Sangram Panigrahi1Ajit Kumar Nayak2Premananda Sahu3https://orcid.org/0000-0002-9360-8423Kaushik Mishra4https://orcid.org/0000-0001-9499-0727Department of Computer Science and Engineering, Institute of Technical Education and Research, Siksha “O” Anusandhan University, Bhubaneswar, Odisha, IndiaDepartment of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha “O” Anusandhan University, Bhubaneswar, Odisha, IndiaDepartment of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha “O” Anusandhan University, Bhubaneswar, Odisha, IndiaSchool of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, IndiaDepartment of Computer Science and Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, IndiaPresently, the exponential growth of unstructured data on the web and social networks has made it increasingly challenging for individuals to retrieve relevant information efficiently. Over the years, various text summarization techniques have been developed to address this issue. However, traditional approaches that rely on directly extracting words often lead to redundancies and fail to establish a strong connection between the summary and the original document. This paper presents a novel Deep Learning (DL)-based text summarization approach incorporating the following phases: pre-processing, feature extraction, vectorization, and summarization using a hybrid Cat Swarm Optimization (CSO) and Harris Hawk Optimization (HHO) algorithm. Initially, input documents undergo pre-processing steps, including sentence segmentation, word tokenization, stop word removal, and lemmatization, to enhance text quality. Features are then extracted using a Restricted Boltzmann Machine (RBM) to obtain nine key attributes. Vectorization is performed using Term Frequency-Inverse Document Frequency (TF-IDF) to represent sentences in vector form. The hybrid CSO-HHO algorithm is subsequently applied to generate summaries. The proposed method’s efficiency was evaluated using datasets from the Document Understanding Conference (DUC), specifically DUC-2002, DUC-2003, and DUC-2005. Metrics such as sensitivity, readability, coherence, precision, BLEU score, ROUGE score, and F-score were analyzed to assess performance. The proposed approach’s results were compared with existing methods, including CSO, QABC, PSO, GJO, FF, and machine learning techniques like SVM and RF. The hybrid CSO-HHO algorithm achieved an accuracy of 99.56%, demonstrating its superiority in text summarization tasks.https://ieeexplore.ieee.org/document/10870163/Text summarizationsingle document summarizationpre-processingfeature extractionvectorizationhybrid CSO-HHO algorithm
spellingShingle Jyotirmayee Rautaray
Sangram Panigrahi
Ajit Kumar Nayak
Premananda Sahu
Kaushik Mishra
Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique
IEEE Access
Text summarization
single document summarization
pre-processing
feature extraction
vectorization
hybrid CSO-HHO algorithm
title Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique
title_full Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique
title_fullStr Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique
title_full_unstemmed Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique
title_short Deep Learning-Based Feature Extraction Technique for Single Document Summarization Using Hybrid Optimization Technique
title_sort deep learning based feature extraction technique for single document summarization using hybrid optimization technique
topic Text summarization
single document summarization
pre-processing
feature extraction
vectorization
hybrid CSO-HHO algorithm
url https://ieeexplore.ieee.org/document/10870163/
work_keys_str_mv AT jyotirmayeerautaray deeplearningbasedfeatureextractiontechniqueforsingledocumentsummarizationusinghybridoptimizationtechnique
AT sangrampanigrahi deeplearningbasedfeatureextractiontechniqueforsingledocumentsummarizationusinghybridoptimizationtechnique
AT ajitkumarnayak deeplearningbasedfeatureextractiontechniqueforsingledocumentsummarizationusinghybridoptimizationtechnique
AT premanandasahu deeplearningbasedfeatureextractiontechniqueforsingledocumentsummarizationusinghybridoptimizationtechnique
AT kaushikmishra deeplearningbasedfeatureextractiontechniqueforsingledocumentsummarizationusinghybridoptimizationtechnique