MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Motivation: Reliable estimation of the mean fragment length for
next-generation short-read sequencing data is an important step in
next-generation sequencing analysis pipelines, most notably because
of its impact on the accuracy of the enriched regions identified by
peak-calling algorithms. Although many peak-calling algorithms include
a fragment-length estimation subroutine, the problem has not
been adequately solved, as demonstrated by the variability of the estimates
returned by different algorithms.
Results: In this article, we investigate the use of strand crosscorrelation
to estimate mean fragment length of single-end data and
show that traditional estimation approaches have mixed reliability. We
observe that the mappability of different parts of the genome can
introduce an artificial bias into cross-correlation computations, resulting
in incorrect fragment-length estimates. We propose a new approach,
called mappability-sensitive cross-correlation (MaSC), which
removes this bias and allows for accurate and reliable fragment-length
estimation. We analyze the computational complexity of this approach,
and evaluate its performance on a test suite of NGS datasets,
demonstrating its superiority to traditional cross-correlation analysis.
Availability: An open-source Perl implementation of our approach is
available at http://www.perkinslab.ca/Software.html.
