Indexing a reference sequence for oligomer sequence mapping
First Claim
1. A method of generating an index, the index operable to determine where, in a reference sequence, a data set of one or more related oligomer sequences maps to the reference sequence, the oligomer sequences of the data set obtained from a same fragment of genetic material, the method comprising:
- applying, with a computer system, a key pattern to the reference sequence to generate a plurality of keys, wherein the key pattern includes a first set of N contiguous positions separated by K positions from a second set of M contiguous positions, the separation being based on predicted relationships between oligomer sequences of the data set, wherein the key pattern is defined by predetermined values for N, K and M, the applying including;
applying the key pattern to a first location of the reference sequence to obtain a first set of bases, the first set of bases including;
N contiguous bases of the reference sequence starting from the first location, andM contiguous bases of the reference sequence starting from N+K positions after the first location, N, M, and K being integers greater than or equal to one;
using the first set of bases to generate a first key;
applying the key pattern to a plurality of other locations to generate other keys, wherein the applying of the key pattern to the first and other locations uses the same values for N, M, and K; and
storing the keys in the index, the index being stored in a searchable computer readable medium;
wherein each key corresponds to one or more possible locations within the reference sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
Generating an index includes receiving a reference sequence and applying one or more key patterns to the reference sequence to obtain a plurality of keys in the index. Each of the one or more key patterns is derived based on a corresponding set of oligomer sequence relationships of a plurality of oligomer sequences that are expected to be generated from the reference, and the keys correspond to a plurality of candidate and/or validated locations in the reference sequence.
-
Citations
28 Claims
-
1. A method of generating an index, the index operable to determine where, in a reference sequence, a data set of one or more related oligomer sequences maps to the reference sequence, the oligomer sequences of the data set obtained from a same fragment of genetic material, the method comprising:
-
applying, with a computer system, a key pattern to the reference sequence to generate a plurality of keys, wherein the key pattern includes a first set of N contiguous positions separated by K positions from a second set of M contiguous positions, the separation being based on predicted relationships between oligomer sequences of the data set, wherein the key pattern is defined by predetermined values for N, K and M, the applying including; applying the key pattern to a first location of the reference sequence to obtain a first set of bases, the first set of bases including; N contiguous bases of the reference sequence starting from the first location, and M contiguous bases of the reference sequence starting from N+K positions after the first location, N, M, and K being integers greater than or equal to one; using the first set of bases to generate a first key; applying the key pattern to a plurality of other locations to generate other keys, wherein the applying of the key pattern to the first and other locations uses the same values for N, M, and K; and storing the keys in the index, the index being stored in a searchable computer readable medium; wherein each key corresponds to one or more possible locations within the reference sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A system for generating an index for oligomer sequence analysis, the index operable to determine where, in a reference sequence, a data set of one or more related oligomer sequences maps to the reference sequence, the oligomer sequences of the data set obtained from a same fragment of genetic material, the system comprising:
-
an interface configured to receive the reference sequence; and a processor coupled to the interface, the processor configured to apply a key pattern to the reference sequence to obtain a plurality of keys for storing in the index, wherein the key pattern includes a first set of N contiguous positions separated by K positions from a second set of M contiguous positions, the separation being based on predicted relationships between oligomer sequences of the data set wherein the key pattern is defined by predetermined values for N, K and M, the applying including; applying the key pattern to a first location of the reference sequence to obtain a first set of bases, the first set of bases including; N contiguous bases of the reference sequence starting from the first location, and M contiguous bases of the reference sequence starting from N+K positions after the first location, N, M, and K being integers greater than or equal to one; using the first set of bases to generate a first key; applying the key pattern to a plurality of other locations to generate other keys, wherein the applying of the key pattern to the first and other locations uses the same values for N, M, and K, wherein the keys correspond to possible locations in the reference sequence.
-
-
28. A computer program product for generating an index, the index operable to determine where, in a reference sequence, a data set of one or more related oligomer sequences maps to the reference sequence, the oligomer sequences of the data set obtained from a same fragment of genetic material, the computer program product being embodied in a non-transitory computer readable medium and comprising computer instructions for:
-
receiving a reference sequence; and applying a key pattern to the reference sequence to obtain a plurality of keys for storing in the index, wherein the key pattern includes a first set of N contiguous positions separated by K positions from a second set of M contiguous positions, the separation being based on predicted relationships between oligomer sequences of the data set wherein the key pattern is defined by predetermined values for N, K and M, the applying including; applying the key pattern to a first location of the reference sequence to obtain a first set of bases, the first set of bases including; N contiguous bases of the reference sequence starting from the first location, and M contiguous bases of the reference sequence starting from N+K positions after the first location, N, M, and K being integers greater than or equal to one; using the first set of bases to generate a first key; applying the key pattern to a plurality of other locations to generate other keys, wherein the applying of the key pattern to the first and other locations uses the same values for N, M, and K, wherein the keys correspond to possible locations in the reference sequence.
-
Specification