Nucleic acid sequence assembly
First Claim
Patent Images
1. A method for determining locally optimal contig configuration of a plurality of contigs within a cluster, the method comprising:
- (I) obtaining read pair data mapping to the plurality of contigs within the cluster, wherein read pair data is obtained from a set of paired end reads obtained by digesting sample DNA to generate internal double strand breaks within the DNA, allowing the double strand breaks to re-ligate randomly to form a plurality of re-ligation junctions, and sequencing at each side of the plurality of re-ligation junctions;
(II) obtaining a set of clustered contigs; and
(III) processing said set of clustered contigs by;
(a) identifying a window of size w contigs starting at position i along the set of clustered contigs;
(b) considering w! 2w ordering and orienting options for contigs of the window of size w contigs by examining scores of orders and orientations of the contigs of the window in each position i in the window;
(c) orienting and ordering w contigs of the window to obtain an optimal score;
(d) shifting the window to position i+1 along the set of clustered contigs;
(e) repeating steps (a), (b) and (c) for said window at position i+1 using the orienting and ordering of w for said window at position i+1 contigs to determine an optimal score, thereby orienting and ordering said plurality of contigs in a locally optimal configuration relative to the score; and
(f) outputting said locally optimal configuration to a network, screen or server.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed herein are compositions, systems and methods related to sequence assembly, such as nucleic acid sequence assembly of single reads and contigs into larger contigs and scaffolds through the use of read pair sequence information, such as read pair information indicative of nucleic acid sequence phase or physical linkage.
-
Citations
29 Claims
-
1. A method for determining locally optimal contig configuration of a plurality of contigs within a cluster, the method comprising:
-
(I) obtaining read pair data mapping to the plurality of contigs within the cluster, wherein read pair data is obtained from a set of paired end reads obtained by digesting sample DNA to generate internal double strand breaks within the DNA, allowing the double strand breaks to re-ligate randomly to form a plurality of re-ligation junctions, and sequencing at each side of the plurality of re-ligation junctions; (II) obtaining a set of clustered contigs; and (III) processing said set of clustered contigs by; (a) identifying a window of size w contigs starting at position i along the set of clustered contigs; (b) considering w! 2w ordering and orienting options for contigs of the window of size w contigs by examining scores of orders and orientations of the contigs of the window in each position i in the window; (c) orienting and ordering w contigs of the window to obtain an optimal score; (d) shifting the window to position i+1 along the set of clustered contigs; (e) repeating steps (a), (b) and (c) for said window at position i+1 using the orienting and ordering of w for said window at position i+1 contigs to determine an optimal score, thereby orienting and ordering said plurality of contigs in a locally optimal configuration relative to the score; and (f) outputting said locally optimal configuration to a network, screen or server. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification