Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning

Abstract Integrating diverse types of biological data is essential for a holistic understanding of cancer biology, yet it remains challenging due to data heterogeneity, complexity, and sparsity. Addressing this, our study introduces an unsupervised deep learning model, MOSA (Multi-Omic Synthetic Aug...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhaoxiang Cai, Sofia Apolinário, Ana R. Baião, Clare Pacini, Miguel D. Sousa, Susana Vinga, Roger R. Reddel, Phillip J. Robinson, Mathew J. Garnett, Qing Zhong, Emanuel Gonçalves
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-54771-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861804943540224
author Zhaoxiang Cai
Sofia Apolinário
Ana R. Baião
Clare Pacini
Miguel D. Sousa
Susana Vinga
Roger R. Reddel
Phillip J. Robinson
Mathew J. Garnett
Qing Zhong
Emanuel Gonçalves
author_facet Zhaoxiang Cai
Sofia Apolinário
Ana R. Baião
Clare Pacini
Miguel D. Sousa
Susana Vinga
Roger R. Reddel
Phillip J. Robinson
Mathew J. Garnett
Qing Zhong
Emanuel Gonçalves
author_sort Zhaoxiang Cai
collection DOAJ
description Abstract Integrating diverse types of biological data is essential for a holistic understanding of cancer biology, yet it remains challenging due to data heterogeneity, complexity, and sparsity. Addressing this, our study introduces an unsupervised deep learning model, MOSA (Multi-Omic Synthetic Augmentation), specifically designed to integrate and augment the Cancer Dependency Map (DepMap). Harnessing orthogonal multi-omic information, this model successfully generates molecular and phenotypic profiles, resulting in an increase of 32.7% in the number of multi-omic profiles and thereby generating a complete DepMap for 1523 cancer cell lines. The synthetically enhanced data increases statistical power, uncovering less studied mechanisms associated with drug resistance, and refines the identification of genetic associations and clustering of cancer cell lines. By applying SHapley Additive exPlanations (SHAP) for model interpretation, MOSA reveals multi-omic features essential for cell clustering and biomarker identification related to drug and gene dependencies. This understanding is crucial for developing much-needed effective strategies to prioritize cancer targets.
format Article
id doaj-art-58b951e3cf64437ebfa82518119ab27f
institution Kabale University
issn 2041-1723
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-58b951e3cf64437ebfa82518119ab27f2025-02-09T12:43:49ZengNature PortfolioNature Communications2041-17232024-11-0115111210.1038/s41467-024-54771-4Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learningZhaoxiang Cai0Sofia Apolinário1Ana R. Baião2Clare Pacini3Miguel D. Sousa4Susana Vinga5Roger R. Reddel6Phillip J. Robinson7Mathew J. Garnett8Qing Zhong9Emanuel Gonçalves10ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of SydneyINESC-IDINESC-IDWellcome Sanger Institute, Wellcome Genome CampusINESC-IDINESC-IDProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of SydneyProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of SydneyWellcome Sanger Institute, Wellcome Genome CampusProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of SydneyINESC-IDAbstract Integrating diverse types of biological data is essential for a holistic understanding of cancer biology, yet it remains challenging due to data heterogeneity, complexity, and sparsity. Addressing this, our study introduces an unsupervised deep learning model, MOSA (Multi-Omic Synthetic Augmentation), specifically designed to integrate and augment the Cancer Dependency Map (DepMap). Harnessing orthogonal multi-omic information, this model successfully generates molecular and phenotypic profiles, resulting in an increase of 32.7% in the number of multi-omic profiles and thereby generating a complete DepMap for 1523 cancer cell lines. The synthetically enhanced data increases statistical power, uncovering less studied mechanisms associated with drug resistance, and refines the identification of genetic associations and clustering of cancer cell lines. By applying SHapley Additive exPlanations (SHAP) for model interpretation, MOSA reveals multi-omic features essential for cell clustering and biomarker identification related to drug and gene dependencies. This understanding is crucial for developing much-needed effective strategies to prioritize cancer targets.https://doi.org/10.1038/s41467-024-54771-4
spellingShingle Zhaoxiang Cai
Sofia Apolinário
Ana R. Baião
Clare Pacini
Miguel D. Sousa
Susana Vinga
Roger R. Reddel
Phillip J. Robinson
Mathew J. Garnett
Qing Zhong
Emanuel Gonçalves
Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning
Nature Communications
title Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning
title_full Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning
title_fullStr Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning
title_full_unstemmed Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning
title_short Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning
title_sort synthetic augmentation of cancer cell line multi omic datasets using unsupervised deep learning
url https://doi.org/10.1038/s41467-024-54771-4
work_keys_str_mv AT zhaoxiangcai syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT sofiaapolinario syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT anarbaiao syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT clarepacini syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT migueldsousa syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT susanavinga syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT rogerrreddel syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT phillipjrobinson syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT mathewjgarnett syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT qingzhong syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning
AT emanuelgoncalves syntheticaugmentationofcancercelllinemultiomicdatasetsusingunsuperviseddeeplearning