MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information

Extraction of comprehensive information from diverse data sources remains a significant challenge in contemporary research. Although multimodal Named Entity Recognition (NER) and Relation Extraction (RE) tasks have garnered significant attention, existing methods often focus on surface-level informa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hyojin Ko, Joon Yoo, Ok-Ran Jeong
Format:	Article
Language:	English
Published:	Elsevier 2025-04-01
Series:	Alexandria Engineering Journal
Subjects:	Multimodal knowledge graph Multimodal data fusing Information extraction Named entity recognition Relation extraction Natural language processing
Online Access:	http://www.sciencedirect.com/science/article/pii/S1110016825001474
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Extraction of comprehensive information from diverse data sources remains a significant challenge in contemporary research. Although multimodal Named Entity Recognition (NER) and Relation Extraction (RE) tasks have garnered significant attention, existing methods often focus on surface-level information, underutilizing the potential depth of the available data. To address this issue, this study introduces a Multimodal Deep-Context Knowledge Extractor (MDCKE) that generates hierarchical multi-scale images and captions from original images. These connectors between image and text enhance information extraction by integrating more complex data relationships and contexts to build a multimodal knowledge graph. Captioning precedes feature extraction, leveraging semantic descriptions to align global and local image features and enhance inter- and intramodality alignment. Experimental validation on the Twitter2015 and Multimodal Neural Relation Extraction (MNRE) datasets demonstrated the novelty and accuracy of MDCKE, resulting in an improvement in the F1-score by up to 5.83% and 26.26%, respectively, compared to State-Of-The-Art (SOTA) models. MDCKE was compared with top models, case studies, and simulations in low-resource settings, proving its flexibility and efficacy. An ablation study further corroborated the contribution of each component, resulting in an approximately 6% enhancement in the F1-score across the datasets.
ISSN:	1110-0168

MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information

Similar Items