MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information
Extraction of comprehensive information from diverse data sources remains a significant challenge in contemporary research. Although multimodal Named Entity Recognition (NER) and Relation Extraction (RE) tasks have garnered significant attention, existing methods often focus on surface-level informa...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-04-01
|
Series: | Alexandria Engineering Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1110016825001474 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861169050353664 |
---|---|
author | Hyojin Ko Joon Yoo Ok-Ran Jeong |
author_facet | Hyojin Ko Joon Yoo Ok-Ran Jeong |
author_sort | Hyojin Ko |
collection | DOAJ |
description | Extraction of comprehensive information from diverse data sources remains a significant challenge in contemporary research. Although multimodal Named Entity Recognition (NER) and Relation Extraction (RE) tasks have garnered significant attention, existing methods often focus on surface-level information, underutilizing the potential depth of the available data. To address this issue, this study introduces a Multimodal Deep-Context Knowledge Extractor (MDCKE) that generates hierarchical multi-scale images and captions from original images. These connectors between image and text enhance information extraction by integrating more complex data relationships and contexts to build a multimodal knowledge graph. Captioning precedes feature extraction, leveraging semantic descriptions to align global and local image features and enhance inter- and intramodality alignment. Experimental validation on the Twitter2015 and Multimodal Neural Relation Extraction (MNRE) datasets demonstrated the novelty and accuracy of MDCKE, resulting in an improvement in the F1-score by up to 5.83% and 26.26%, respectively, compared to State-Of-The-Art (SOTA) models. MDCKE was compared with top models, case studies, and simulations in low-resource settings, proving its flexibility and efficacy. An ablation study further corroborated the contribution of each component, resulting in an approximately 6% enhancement in the F1-score across the datasets. |
format | Article |
id | doaj-art-33c5665e190a4dbb9214983fdc031dde |
institution | Kabale University |
issn | 1110-0168 |
language | English |
publishDate | 2025-04-01 |
publisher | Elsevier |
record_format | Article |
series | Alexandria Engineering Journal |
spelling | doaj-art-33c5665e190a4dbb9214983fdc031dde2025-02-10T04:34:14ZengElsevierAlexandria Engineering Journal1110-01682025-04-01119478492MDCKE: Multimodal deep-context knowledge extractor that integrates contextual informationHyojin Ko0Joon Yoo1Ok-Ran Jeong2School of Computing, Gachon University, Seongnam-si, 13129, Gyeonggi-do, Republic of KoreaCorresponding authors.; School of Computing, Gachon University, Seongnam-si, 13129, Gyeonggi-do, Republic of KoreaCorresponding authors.; School of Computing, Gachon University, Seongnam-si, 13129, Gyeonggi-do, Republic of KoreaExtraction of comprehensive information from diverse data sources remains a significant challenge in contemporary research. Although multimodal Named Entity Recognition (NER) and Relation Extraction (RE) tasks have garnered significant attention, existing methods often focus on surface-level information, underutilizing the potential depth of the available data. To address this issue, this study introduces a Multimodal Deep-Context Knowledge Extractor (MDCKE) that generates hierarchical multi-scale images and captions from original images. These connectors between image and text enhance information extraction by integrating more complex data relationships and contexts to build a multimodal knowledge graph. Captioning precedes feature extraction, leveraging semantic descriptions to align global and local image features and enhance inter- and intramodality alignment. Experimental validation on the Twitter2015 and Multimodal Neural Relation Extraction (MNRE) datasets demonstrated the novelty and accuracy of MDCKE, resulting in an improvement in the F1-score by up to 5.83% and 26.26%, respectively, compared to State-Of-The-Art (SOTA) models. MDCKE was compared with top models, case studies, and simulations in low-resource settings, proving its flexibility and efficacy. An ablation study further corroborated the contribution of each component, resulting in an approximately 6% enhancement in the F1-score across the datasets.http://www.sciencedirect.com/science/article/pii/S1110016825001474Multimodal knowledge graphMultimodal data fusingInformation extractionNamed entity recognitionRelation extractionNatural language processing |
spellingShingle | Hyojin Ko Joon Yoo Ok-Ran Jeong MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information Alexandria Engineering Journal Multimodal knowledge graph Multimodal data fusing Information extraction Named entity recognition Relation extraction Natural language processing |
title | MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information |
title_full | MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information |
title_fullStr | MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information |
title_full_unstemmed | MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information |
title_short | MDCKE: Multimodal deep-context knowledge extractor that integrates contextual information |
title_sort | mdcke multimodal deep context knowledge extractor that integrates contextual information |
topic | Multimodal knowledge graph Multimodal data fusing Information extraction Named entity recognition Relation extraction Natural language processing |
url | http://www.sciencedirect.com/science/article/pii/S1110016825001474 |
work_keys_str_mv | AT hyojinko mdckemultimodaldeepcontextknowledgeextractorthatintegratescontextualinformation AT joonyoo mdckemultimodaldeepcontextknowledgeextractorthatintegratescontextualinformation AT okranjeong mdckemultimodaldeepcontextknowledgeextractorthatintegratescontextualinformation |