77. PSEUDO-CHROMOSOME ASSEMBLY OF LARGE AND COMPLEX GENOMES USING MULTIPLE REFERENCES
Name: Mikhail Alekseyevich Kolmogorov
Grad Year: 2020
Despite the rapid development of sequencing technologies, assembly of mammalian-scale genomes into complete chromosomes is one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout2, a reference-assisted assembly tool for large and complex genomes. Taking as input a target assembly (generated from an NGS assembler) and one or multiple related references, Ragout2 infers their evolutionary relationship and builds the final assembly of the target genome using a genome rearrangement approach. While Ragout2 and Ragout (designed for bacterial genomes) both use multiple references to infer the final assembly, Ragout2 addresses many new algorithmic challenges that do not appear in bacterial genome assembly. Indeed, reference-assisted assembly for mammalian-scale genomes is intrinsically harder than for bacterial genomes, since the genome size, the rearrangement distance, the repeat structure, and the number of misassemblies are always higher in the mammalian case. Using simulated data and two real wild-type mouse assemblies with extensive rearrangements, we show that with the availability of multiple references, Ragout2 is capable of turning a typical mammalian assembly with thousands of contigs into high-quality chromosomes. Strikingly, chromosome color maps confirmed most large-scale rearrangements that Ragout2 detected. The Ragout2 software is freely available at: http://fenderglass.github.io/Ragout/.
Industry Application Area(s)
Software, Analytics | Bioinformatics