HairSplitter: haplotype assembly from long, noisy reads
Motivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We intro...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Peer Community In
2024-10-01
|
Series: | Peer Community Journal |
Subjects: | |
Online Access: | https://peercommunityjournal.org/articles/10.24072/pcjournal.481/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Motivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We introduce a novel software, HairSplitter, designed to retrieve strains from a partially or totally collapsed assembly and long reads. The method uses a custom variant-calling process to operate with erroneous long reads and introduces a new read binning algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter recovers more strains while being faster than state-of-the-art tools, both in the cases of viruses and bacteria. Availability: HairSplitter is freely available on GitHub at https://github.com/RolandFaure/Hairsplitter (https://doi.org/10.5281/zenodo.13753481). |
---|---|
ISSN: | 2804-3871 |