Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation

Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusi...

Full description

Saved in:
Bibliographic Details
Main Authors: Mikhail Arbatsky, Ekaterina Vasilyeva, Veronika Sysoeva, Ekaterina Semina, Valeri Saveliev, Kseniya Rubina
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-02-01
Series:Frontiers in Bioinformatics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fbinf.2025.1519468/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823856580904353792
author Mikhail Arbatsky
Ekaterina Vasilyeva
Veronika Sysoeva
Ekaterina Semina
Ekaterina Semina
Valeri Saveliev
Kseniya Rubina
author_facet Mikhail Arbatsky
Ekaterina Vasilyeva
Veronika Sysoeva
Ekaterina Semina
Ekaterina Semina
Valeri Saveliev
Kseniya Rubina
author_sort Mikhail Arbatsky
collection DOAJ
description Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.
format Article
id doaj-art-57378452de2d4057b43b6b447022621b
institution Kabale University
issn 2673-7647
language English
publishDate 2025-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Bioinformatics
spelling doaj-art-57378452de2d4057b43b6b447022621b2025-02-12T07:26:08ZengFrontiers Media S.A.Frontiers in Bioinformatics2673-76472025-02-01510.3389/fbinf.2025.15194681519468Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretationMikhail Arbatsky0Ekaterina Vasilyeva1Veronika Sysoeva2Ekaterina Semina3Ekaterina Semina4Valeri Saveliev5Kseniya Rubina6Faculty of Medicine, Lomonosov Moscow State University, Moscow, RussiaInstitute of Higher Technologies, Immanuel Kant Baltic Federal University, Kaliningrad, RussiaFaculty of Medicine, Lomonosov Moscow State University, Moscow, RussiaFaculty of Medicine, Lomonosov Moscow State University, Moscow, RussiaInstitute of Medicine and Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, RussiaInstitute of Higher Technologies, Immanuel Kant Baltic Federal University, Kaliningrad, RussiaFaculty of Medicine, Lomonosov Moscow State University, Moscow, RussiaProcessing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.https://www.frontiersin.org/articles/10.3389/fbinf.2025.1519468/fullbiocentric mathematicsScRNA-seqdimension reductioncell clusteringdatasets integration
spellingShingle Mikhail Arbatsky
Ekaterina Vasilyeva
Veronika Sysoeva
Ekaterina Semina
Ekaterina Semina
Valeri Saveliev
Kseniya Rubina
Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation
Frontiers in Bioinformatics
biocentric mathematics
ScRNA-seq
dimension reduction
cell clustering
datasets integration
title Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation
title_full Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation
title_fullStr Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation
title_full_unstemmed Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation
title_short Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation
title_sort seurat function argument values in scrna seq data analysis potential pitfalls and refinements for biological interpretation
topic biocentric mathematics
ScRNA-seq
dimension reduction
cell clustering
datasets integration
url https://www.frontiersin.org/articles/10.3389/fbinf.2025.1519468/full
work_keys_str_mv AT mikhailarbatsky seuratfunctionargumentvaluesinscrnaseqdataanalysispotentialpitfallsandrefinementsforbiologicalinterpretation
AT ekaterinavasilyeva seuratfunctionargumentvaluesinscrnaseqdataanalysispotentialpitfallsandrefinementsforbiologicalinterpretation
AT veronikasysoeva seuratfunctionargumentvaluesinscrnaseqdataanalysispotentialpitfallsandrefinementsforbiologicalinterpretation
AT ekaterinasemina seuratfunctionargumentvaluesinscrnaseqdataanalysispotentialpitfallsandrefinementsforbiologicalinterpretation
AT ekaterinasemina seuratfunctionargumentvaluesinscrnaseqdataanalysispotentialpitfallsandrefinementsforbiologicalinterpretation
AT valerisaveliev seuratfunctionargumentvaluesinscrnaseqdataanalysispotentialpitfallsandrefinementsforbiologicalinterpretation
AT kseniyarubina seuratfunctionargumentvaluesinscrnaseqdataanalysispotentialpitfallsandrefinementsforbiologicalinterpretation