Conditional similarity triplets enable covariate-informed representations of single-cell data

Abstract Background Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per individual cell. In order to translate immune signatures assayed from blood or tissue into powerful diagnostics, machine learning approac...

Full description

Saved in:
Bibliographic Details
Main Authors: Chi-Jane Chen, Haidong Yi, Natalie Stanley
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06069-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861568798982144
author Chi-Jane Chen
Haidong Yi
Natalie Stanley
author_facet Chi-Jane Chen
Haidong Yi
Natalie Stanley
author_sort Chi-Jane Chen
collection DOAJ
description Abstract Background Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per individual cell. In order to translate immune signatures assayed from blood or tissue into powerful diagnostics, machine learning approaches are often employed to compute immunological summaries or per-sample featurizations, which can be used as inputs to models for outcomes of interest. Current supervised learning approaches for computing per-sample representations are trained only to accurately predict a single outcome and do not take into account relevant additional clinical features or covariates that are likely to also be measured for each sample. Results Here, we introduce a novel approach for incorporating measured covariates in optimizing model parameters to ultimately specify per-sample encodings that accurately affect both immune signatures and additional clinical information. Our introduced method CytoCoSet is a set-based encoding method for learning per-sample featurizations, which formulates a loss function with an additional triplet term penalizing samples with similar covariates from having disparate embedding results in per-sample representations. Conclusions Overall, incorporating clinical covariates enables the learning of encodings for each individual sample that ultimately improve prediction of clinical outcome. This integration of information disparate more robust predictions of clinical phenotypes and holds significant potential for enhancing diagnostic and treatment strategies.
format Article
id doaj-art-739b0013716d459598dba0791e9f0173
institution Kabale University
issn 1471-2105
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-739b0013716d459598dba0791e9f01732025-02-09T12:56:58ZengBMCBMC Bioinformatics1471-21052025-02-0126111610.1186/s12859-025-06069-5Conditional similarity triplets enable covariate-informed representations of single-cell dataChi-Jane Chen0Haidong Yi1Natalie Stanley2Department of Computer Science, The University of North Carolina at Chapel HillDepartment of Computer Science, The University of North Carolina at Chapel HillDepartment of Computer Science, The University of North Carolina at Chapel HillAbstract Background Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per individual cell. In order to translate immune signatures assayed from blood or tissue into powerful diagnostics, machine learning approaches are often employed to compute immunological summaries or per-sample featurizations, which can be used as inputs to models for outcomes of interest. Current supervised learning approaches for computing per-sample representations are trained only to accurately predict a single outcome and do not take into account relevant additional clinical features or covariates that are likely to also be measured for each sample. Results Here, we introduce a novel approach for incorporating measured covariates in optimizing model parameters to ultimately specify per-sample encodings that accurately affect both immune signatures and additional clinical information. Our introduced method CytoCoSet is a set-based encoding method for learning per-sample featurizations, which formulates a loss function with an additional triplet term penalizing samples with similar covariates from having disparate embedding results in per-sample representations. Conclusions Overall, incorporating clinical covariates enables the learning of encodings for each individual sample that ultimately improve prediction of clinical outcome. This integration of information disparate more robust predictions of clinical phenotypes and holds significant potential for enhancing diagnostic and treatment strategies.https://doi.org/10.1186/s12859-025-06069-5Single-cellImmune profilingDeep-learningClinical prediction
spellingShingle Chi-Jane Chen
Haidong Yi
Natalie Stanley
Conditional similarity triplets enable covariate-informed representations of single-cell data
BMC Bioinformatics
Single-cell
Immune profiling
Deep-learning
Clinical prediction
title Conditional similarity triplets enable covariate-informed representations of single-cell data
title_full Conditional similarity triplets enable covariate-informed representations of single-cell data
title_fullStr Conditional similarity triplets enable covariate-informed representations of single-cell data
title_full_unstemmed Conditional similarity triplets enable covariate-informed representations of single-cell data
title_short Conditional similarity triplets enable covariate-informed representations of single-cell data
title_sort conditional similarity triplets enable covariate informed representations of single cell data
topic Single-cell
Immune profiling
Deep-learning
Clinical prediction
url https://doi.org/10.1186/s12859-025-06069-5
work_keys_str_mv AT chijanechen conditionalsimilaritytripletsenablecovariateinformedrepresentationsofsinglecelldata
AT haidongyi conditionalsimilaritytripletsenablecovariateinformedrepresentationsofsinglecelldata
AT nataliestanley conditionalsimilaritytripletsenablecovariateinformedrepresentationsofsinglecelldata