Objective review of de novo stand-alone error correction methods for NGS data

Autores UPV
Año
Revista Wiley Interdisciplinary Reviews. Computational Molecular Science (Online)

Abstract

The sequencing market has increased steadily over the last few years, with different approaches to read DNA information prone to different types of errors. Multiple studies demonstrated the impact of sequencing errors on different applications of next-generation sequencing (NGS), making error correction a fundamental initial step. Different methods in the literature use different approaches and fit different types of problems. We analyzed 50 methods divided into five main approaches (k-spectrum, suffix arrays, multiple-sequence alignment, read clustering, and probabilistic models). They are not published as a part of a suite (stand-alone), and target raw, unprocessed data without an existing reference genome (de novo). These correctors handle one or more sequencing technologies using the same or different approaches. They face general challenges (sometimes with specific traits for specific technologies) such as repetitive regions, uncalled bases, and ploidy. Even assessing their performance is a challenge in itself because of the approach taken by various authors, the unknown factor (de novo), and the behavior of the third-party tools employed in the benchmarks. This study aims to help the researcher in the field to advance the field of error correction, the educator to have a brief but comprehensive companion, and the bioinformatician to choose the right tool for the right job