Robust and scalable barcoding for massively parallel long‑read sequencing
No Thumbnail Available
Date
2022-12
Journal Title
Journal ISSN
Volume Title
Publisher
Nature Research
Abstract
Description
Nucleic-acid barcoding is an enabling technique for many applications, but its use remains limited
in emerging long-read sequencing technologies with intrinsically low raw accuracy. Here, we apply
so-called NS-watermark barcodes, whose error correction capability was previously validated
in silico, in a proof of concept where we synthesize 3840 NS-watermark barcodes and use them
to asymmetrically tag and simultaneously sequence amplicons from two evolutionarily distant
species (namely Bordetella pertussis and Drosophila mojavensis) on the ONT MinION platform. To our
knowledge, this is the largest number of distinct, non-random tags ever sequenced in parallel and the
frst report of microarray-based synthesis as a source for large oligonucleotide pools for barcoding.
We recovered the identity of more than 86% of the barcodes, with a crosstalk rate of 0.17% (i.e., one
misassignment every 584 reads). This falls in the range of the index hopping rate of established, highaccuracy Illumina sequencing, despite the increased number of tags and the relatively low accuracy
of both microarray-based synthesis and long-read sequencing. The robustness of NS-watermark
barcodes, together with their scalable design and compatibility with low-cost massive synthesis,
makes them promising for present and future sequencing applications requiring massive labeling, such
as long-read single-cell RNA-Seq.
Keywords
DNA barcoding, High-throughput nucleotide sequencing, Sequence Analysis