Error detection in sequence tag directed subassemblies of short sequencing reads
First Claim
Patent Images
1. A method for detecting errors occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising:
- (a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first defined sequence;
(b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified molecules comprise a sequence identical to or complementary to at least a portion of the first adaptor molecule and sequence identical to or complementary to at least a portion of the at least one member of the target library;
(c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library;
(d) grouping the plurality of sequencing reads that correspond to the at least one member of the target library; and
(e) detecting whether an error exists at a nucleotide position, wherein an error exists when variation of nucleotide identity exists among the grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library.
0 Assignments
0 Petitions
Accused Products
Abstract
The invention provides compositions and methods for preparing DNA sequencing libraries. In particular, the method relates to preparing DNA sequencing libraries from kilobase scale nucleic acids. The invention also provides methods for assembling short read sequencing data into longer contiguous sequences. The method is useful for various applications in genomics, including genome assembly, full length cDNA sequencing, metagenomics, and the analysis of repetitive sequences of assembled genomes.
162 Citations
19 Claims
-
1. A method for detecting errors occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising:
-
(a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first defined sequence; (b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified molecules comprise a sequence identical to or complementary to at least a portion of the first adaptor molecule and sequence identical to or complementary to at least a portion of the at least one member of the target library; (c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library; (d) grouping the plurality of sequencing reads that correspond to the at least one member of the target library; and (e) detecting whether an error exists at a nucleotide position, wherein an error exists when variation of nucleotide identity exists among the grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for correcting errors occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising:
-
(a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first defined sequence; (b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified molecules comprise a sequence identical to or complementary to at least a portion of the first adaptor molecule and a sequence identical to or complementary to at least a portion of the at least one member of the target library; (c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library; (d) grouping the plurality of sequencing reads that correspond to the at least one member of the target library; (e) detecting whether an error exists at a nucleotide position, wherein an error exists when variation of nucleotide identity exists among the grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library; and (f) determining the correct identity of the nucleotide at the position where variation of nucleotide identity is detected, wherein the correct identity is determined based on the consensus of the individual base calls in the plurality of grouped sequencing reads. - View Dependent Claims (19)
-
Specification