Explainable Graph Spectral Clustering of text documents.

Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, the results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relati...

Full description

Saved in:
Bibliographic Details
Main Authors: Bartłomiej Starosta, Mieczysław A Kłopotek, Sławomir T Wierzchoń, Dariusz Czerski, Marcin Sydow, Piotr Borkowski
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0313238
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823864102810812416
author Bartłomiej Starosta
Mieczysław A Kłopotek
Sławomir T Wierzchoń
Dariusz Czerski
Marcin Sydow
Piotr Borkowski
author_facet Bartłomiej Starosta
Mieczysław A Kłopotek
Sławomir T Wierzchoń
Dariusz Czerski
Marcin Sydow
Piotr Borkowski
author_sort Bartłomiej Starosta
collection DOAJ
description Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, the results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore, there is an urgent need to elaborate methods for explaining the outcome of the clustering. We have constructed in this paper a theoretical bridge linking the clusters resulting from Graph Spectral Clustering and the actual document content, given that similarities between documents are computed as cosine measures in tf or tfidf representation. This link enables to provide with explanation of cluster membership in clusters produced by GSA. We present a proposal of explanation of the results of combinatorial and normalized Laplacian based graph spectral clustering. For this purpose, we show (approximate) equivalence of combinatorial Laplacian embedding and of K-embedding (proposed in this paper) and term vector space embedding. We performed an experimental study showing that K-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions. We show also perfect equivalence of normalized Laplacian embedding and the [Formula: see text]-embedding (proposed in this paper) and (weighted) term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results using both combinatorial and normalized Laplacian based Graph Spectral Clustering methods. We provide a theoretical background for our approach. An initial version of this paper is available at arXiv, (Starosta B 2023). The Reader may refer to that text to get acquainted with formal aspects of our method and find a detailed overview of motivation.
format Article
id doaj-art-9a103d5704844228b5316006b99bd910
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-9a103d5704844228b5316006b99bd9102025-02-09T05:30:36ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e031323810.1371/journal.pone.0313238Explainable Graph Spectral Clustering of text documents.Bartłomiej StarostaMieczysław A KłopotekSławomir T WierzchońDariusz CzerskiMarcin SydowPiotr BorkowskiSpectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, the results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore, there is an urgent need to elaborate methods for explaining the outcome of the clustering. We have constructed in this paper a theoretical bridge linking the clusters resulting from Graph Spectral Clustering and the actual document content, given that similarities between documents are computed as cosine measures in tf or tfidf representation. This link enables to provide with explanation of cluster membership in clusters produced by GSA. We present a proposal of explanation of the results of combinatorial and normalized Laplacian based graph spectral clustering. For this purpose, we show (approximate) equivalence of combinatorial Laplacian embedding and of K-embedding (proposed in this paper) and term vector space embedding. We performed an experimental study showing that K-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions. We show also perfect equivalence of normalized Laplacian embedding and the [Formula: see text]-embedding (proposed in this paper) and (weighted) term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results using both combinatorial and normalized Laplacian based Graph Spectral Clustering methods. We provide a theoretical background for our approach. An initial version of this paper is available at arXiv, (Starosta B 2023). The Reader may refer to that text to get acquainted with formal aspects of our method and find a detailed overview of motivation.https://doi.org/10.1371/journal.pone.0313238
spellingShingle Bartłomiej Starosta
Mieczysław A Kłopotek
Sławomir T Wierzchoń
Dariusz Czerski
Marcin Sydow
Piotr Borkowski
Explainable Graph Spectral Clustering of text documents.
PLoS ONE
title Explainable Graph Spectral Clustering of text documents.
title_full Explainable Graph Spectral Clustering of text documents.
title_fullStr Explainable Graph Spectral Clustering of text documents.
title_full_unstemmed Explainable Graph Spectral Clustering of text documents.
title_short Explainable Graph Spectral Clustering of text documents.
title_sort explainable graph spectral clustering of text documents
url https://doi.org/10.1371/journal.pone.0313238
work_keys_str_mv AT bartłomiejstarosta explainablegraphspectralclusteringoftextdocuments
AT mieczysławakłopotek explainablegraphspectralclusteringoftextdocuments
AT sławomirtwierzchon explainablegraphspectralclusteringoftextdocuments
AT dariuszczerski explainablegraphspectralclusteringoftextdocuments
AT marcinsydow explainablegraphspectralclusteringoftextdocuments
AT piotrborkowski explainablegraphspectralclusteringoftextdocuments