Error detection in sequence tag directed subassemblies of short sequencing reads
First Claim
Patent Images
1. A method for detecting an error occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising:
- (a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first tag sequence;
(b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified DNA molecules comprise a sequence identical to or complementary to the first tag sequence and a sequence identical to or complementary to at least a portion of the at least one member of the target library;
(c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library and comprising a sequence identical to or complementary to the first tag sequence;
(d) grouping the plurality of sequencing reads that correspond to the same at least one member of the target library based solely on the commonality of having the first tag sequence or a complement thereof to produce a plurality of grouped sequencing reads; and
(e) detecting whether an error exists at a nucleotide position, wherein an error exists when a variation of nucleotide identity exists among the plurality of grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention provides methods for preparing DNA sequencing libraries by assembling short read sequencing data into longer contiguous sequences for genome assembly, full length cDNA sequencing, metagenomics, and the analysis of repetitive sequences of assembled genomes.
91 Citations
23 Claims
-
1. A method for detecting an error occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising:
-
(a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first tag sequence; (b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified DNA molecules comprise a sequence identical to or complementary to the first tag sequence and a sequence identical to or complementary to at least a portion of the at least one member of the target library; (c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library and comprising a sequence identical to or complementary to the first tag sequence; (d) grouping the plurality of sequencing reads that correspond to the same at least one member of the target library based solely on the commonality of having the first tag sequence or a complement thereof to produce a plurality of grouped sequencing reads; and (e) detecting whether an error exists at a nucleotide position, wherein an error exists when a variation of nucleotide identity exists among the plurality of grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for correcting an error occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising:
-
(a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first tag sequence; (b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified DNA molecules comprise a sequence identical to or complementary to the first tag sequence and a sequence identical to or complementary to at least a portion of the at least one member of the target library; (c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library and comprising a sequence identical to or complementary to the first tag sequence; (d) grouping the plurality of sequencing reads that correspond to the at least one member of the target library based solely on the commonality of having the first tag sequence or a complement thereof to produce a plurality of grouped sequencing reads; (e) detecting whether an error exists at a nucleotide position, wherein an error exists when a variation of nucleotide identity exists among the plurality of grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library; and (f) determining a correct identity of the nucleotide at the position where the variation of nucleotide identity is detected, wherein the correct identity is determined based on a consensus of individual base calls in the plurality of grouped sequencing reads. - View Dependent Claims (17)
-
-
18. A method of detecting an error occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising:
-
(a) grouping a plurality of nucleic acid sequencing reads based solely on the commonality of having a first non-degenerate sequence tag to produce a plurality of grouped sequencing reads, wherein the nucleic acid sequencing reads are produced by; (i) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises the first tag sequence; (ii) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified DNA molecules comprise a sequence identical to or complementary to the first tag sequence and a sequence identical to or complementary to at least a portion of the at least one member of the target library; and (iii) sequencing at least a portion of the plurality of amplified DNA molecules to produce the plurality of sequencing reads that each comprise a sequence identical to or complementary to at least a portion of the at least one member of the target library and a sequence identical to or complementary to the first tag sequence; and (b) detecting whether an error exists at a nucleotide position, wherein an error exists when a variation of nucleotide identity exists among the plurality of grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library. - View Dependent Claims (19, 20, 21, 22, 23)
-
Specification