Cancer phylogenetic tree inference at scale from 1000s of single cell genomes

A new generation of scalable single cell whole genome sequencing (scWGS) methods allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cell populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing the mutational processes...

Full description

Saved in:
Bibliographic Details
Main Authors: Salehi, Sohrab, Dorri, Fatemeh, Chern, Kevin, Kabeer, Farhia, Rusk, Nicole, Funnell, Tyler, Williams, Marc J., Lai, Daniel, Andronescu, Mirela, Campbell, Kieran R., McPherson, Andrew, Aparicio, Samuel, Roth, Andrew, Shah, Sohrab P., Bouchard-Côté, Alexandre
Format: Article
Language:English
Published: Peer Community In 2023-07-01
Series:Peer Community Journal
Subjects:
Online Access:https://peercommunityjournal.org/articles/10.24072/pcjournal.292/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825206407549419520
author Salehi, Sohrab
Dorri, Fatemeh
Chern, Kevin
Kabeer, Farhia
Rusk, Nicole
Funnell, Tyler
Williams, Marc J.
Lai, Daniel
Andronescu, Mirela
Campbell, Kieran R.
McPherson, Andrew
Aparicio, Samuel
Roth, Andrew
Shah, Sohrab P.
Bouchard-Côté, Alexandre
author_facet Salehi, Sohrab
Dorri, Fatemeh
Chern, Kevin
Kabeer, Farhia
Rusk, Nicole
Funnell, Tyler
Williams, Marc J.
Lai, Daniel
Andronescu, Mirela
Campbell, Kieran R.
McPherson, Andrew
Aparicio, Samuel
Roth, Andrew
Shah, Sohrab P.
Bouchard-Côté, Alexandre
author_sort Salehi, Sohrab
collection DOAJ
description A new generation of scalable single cell whole genome sequencing (scWGS) methods allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cell populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing the mutational processes that gave rise to them. Existing phylogenetic tree building models do not scale to the tens of thousands of high resolution genomes achievable with current scWGS methods. We constructed a phylogenetic model and associated Bayesian inference procedure, sitka, specifically for scWGS data. The method is based on a novel phylogenetic encoding of copy number (CN) data, the sitka transformation, that simplifies the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. The sitka transformation allows us to design novel scalable Markov chain Monte Carlo (MCMC) algorithms. Moreover, we introduce a novel point mutation calling method that incorporates the CN data and the underlying phylogenetic tree to overcome the low per-cell coverage of scWGS. We demonstrate our method on three single cell datasets, including a novel PDX series, and analyse the topological properties of the inferred trees. Sitka is freely available at https://github.com/UBC-Stat-ML/sitkatree.git
format Article
id doaj-art-7d336692a2954e17ba3b00dfa26ea966
institution Kabale University
issn 2804-3871
language English
publishDate 2023-07-01
publisher Peer Community In
record_format Article
series Peer Community Journal
spelling doaj-art-7d336692a2954e17ba3b00dfa26ea9662025-02-07T10:16:49ZengPeer Community InPeer Community Journal2804-38712023-07-01310.24072/pcjournal.29210.24072/pcjournal.292Cancer phylogenetic tree inference at scale from 1000s of single cell genomes Salehi, Sohrab0Dorri, Fatemeh1Chern, Kevin2Kabeer, Farhia3https://orcid.org/0000-0003-3456-507XRusk, Nicole4https://orcid.org/0000-0003-2663-6288Funnell, Tyler5https://orcid.org/0000-0003-1612-5644Williams, Marc J.6https://orcid.org/0000-0001-5524-4174Lai, Daniel7https://orcid.org/0000-0001-9203-6323Andronescu, Mirela8Campbell, Kieran R.9https://orcid.org/0000-0003-1981-5763McPherson, Andrew10https://orcid.org/0000-0002-5654-5101Aparicio, Samuel11https://orcid.org/0000-0002-0487-9599Roth, Andrew12https://orcid.org/0000-0003-3422-8823Shah, Sohrab P.13https://orcid.org/0000-0001-6402-523XBouchard-Côté, Alexandre14Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, USADepartment of Computer Science, University of British Columbia, CanadaDepartment of Statistics, University of British Columbia, CanadaDepartment of Pathology and Laboratory Medicine, University of British Columbia, CanadaComputational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, USAComputational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, USAComputational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, USADepartment of Pathology and Laboratory Medicine, University of British Columbia, Canada; Department of Molecular Oncology, BC Cancer Research Centre, CanadaDepartment of Pathology and Laboratory Medicine, University of British Columbia, Canada; Department of Molecular Oncology, BC Cancer Research Centre, CanadaLunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Canada; Department of Molecular Genetics, University of Toronto, Canada; Department of Statistical Sciences, University of Toronto, CanadaComputational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, USADepartment of Pathology and Laboratory Medicine, University of British Columbia, Canada; Department of Molecular Oncology, BC Cancer Research Centre, CanadaDepartment of Computer Science, University of British Columbia, Canada; Department of Pathology and Laboratory Medicine, University of British Columbia, Canada; Department of Molecular Oncology, BC Cancer Research Centre, CanadaComputational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, USADepartment of Statistics, University of British Columbia, CanadaA new generation of scalable single cell whole genome sequencing (scWGS) methods allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cell populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing the mutational processes that gave rise to them. Existing phylogenetic tree building models do not scale to the tens of thousands of high resolution genomes achievable with current scWGS methods. We constructed a phylogenetic model and associated Bayesian inference procedure, sitka, specifically for scWGS data. The method is based on a novel phylogenetic encoding of copy number (CN) data, the sitka transformation, that simplifies the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. The sitka transformation allows us to design novel scalable Markov chain Monte Carlo (MCMC) algorithms. Moreover, we introduce a novel point mutation calling method that incorporates the CN data and the underlying phylogenetic tree to overcome the low per-cell coverage of scWGS. We demonstrate our method on three single cell datasets, including a novel PDX series, and analyse the topological properties of the inferred trees. Sitka is freely available at https://github.com/UBC-Stat-ML/sitkatree.git https://peercommunityjournal.org/articles/10.24072/pcjournal.292/Phylogenetics, Cancer evolution, Bayesian statistics, MCMC, Copy number evolution, PDX, Triple negative breast cancer
spellingShingle Salehi, Sohrab
Dorri, Fatemeh
Chern, Kevin
Kabeer, Farhia
Rusk, Nicole
Funnell, Tyler
Williams, Marc J.
Lai, Daniel
Andronescu, Mirela
Campbell, Kieran R.
McPherson, Andrew
Aparicio, Samuel
Roth, Andrew
Shah, Sohrab P.
Bouchard-Côté, Alexandre
Cancer phylogenetic tree inference at scale from 1000s of single cell genomes
Peer Community Journal
Phylogenetics, Cancer evolution, Bayesian statistics, MCMC, Copy number evolution, PDX, Triple negative breast cancer
title Cancer phylogenetic tree inference at scale from 1000s of single cell genomes
title_full Cancer phylogenetic tree inference at scale from 1000s of single cell genomes
title_fullStr Cancer phylogenetic tree inference at scale from 1000s of single cell genomes
title_full_unstemmed Cancer phylogenetic tree inference at scale from 1000s of single cell genomes
title_short Cancer phylogenetic tree inference at scale from 1000s of single cell genomes
title_sort cancer phylogenetic tree inference at scale from 1000s of single cell genomes
topic Phylogenetics, Cancer evolution, Bayesian statistics, MCMC, Copy number evolution, PDX, Triple negative breast cancer
url https://peercommunityjournal.org/articles/10.24072/pcjournal.292/
work_keys_str_mv AT salehisohrab cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT dorrifatemeh cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT chernkevin cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT kabeerfarhia cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT rusknicole cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT funnelltyler cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT williamsmarcj cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT laidaniel cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT andronescumirela cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT campbellkieranr cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT mcphersonandrew cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT apariciosamuel cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT rothandrew cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT shahsohrabp cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes
AT bouchardcotealexandre cancerphylogenetictreeinferenceatscalefrom1000sofsinglecellgenomes