Method of identifying virtual representations of nucleotide sequences
First Claim
1. A method of identifying an oligonucleotide, the method comprising(A) cleaving a genome of at least Z basepairs in silico with a restriction enzyme to generate a plurality of predicted nucleic acid molecules,(B) generating a virtual representation of said genome by identifying predicted nucleic acid molecules, wherein each predicted nucleic acid molecule has a length of 200-1,200 basepairs, inclusive;
- (C) calculating the following;
(i) Z≧
1×
108;
(ii) 300≧
K≧
30;
(iii) the integer closest to (log4(Z)+2)≧
L1≧
the integer closest to log4(Z);
(iv) X is the integer closest to D1×
(K−
L1+1);
(v) Y is the integer closest to D2×
(K−
L1+1);
(vi) 1.5≧
D1≧
1; and
(vii) 1>
D2≧
0.5;
(D) selecting oligonucleotides each having a length of K nucleotides, inclusive, and each with at least 90% sequence identity to a predicted nucleic acid molecule in (B);
(E) identifying all of the L1-mers occurring in each oligonucleotide; and
(F) selecting one or more oligonucleotides that have a sum total value of L1-mer counts in the virtual representation of no fewer than Y and no more than X, wherein an L1-mer is a subregion of the oligonucleotide having a length of L1 nucleotides, wherein an L1-mer count is the number of times the sequence represented by one L1-mer occurs in the genome, and wherein the sum total value of L1-mer counts is the sum of every L1-mer count of the oligonucleotide occurring in the virtual representation.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention provides oligonucleotide probes that can be used to hybridize to a representation of nucleic acid sequences. Compositions containing the probes such as microarrays are also provided. The invention also provides methods of using these probes and compositions in therapeutic, diagnostic, and research applications. Systems and methods for using a word counting algorithm that can quickly and accurately count the number of times a particular string of characters (i.e., nucleotides) appears in a nucleotide sequence (e.g., a genome) are provided. This algorithm can be used to identify the oligonucleotide probes of the invention. The algorithm uses a transform of a genome and an auxiliary data structure to count the number of times a particular word occurs in the genome.
-
Citations
12 Claims
-
1. A method of identifying an oligonucleotide, the method comprising
(A) cleaving a genome of at least Z basepairs in silico with a restriction enzyme to generate a plurality of predicted nucleic acid molecules, (B) generating a virtual representation of said genome by identifying predicted nucleic acid molecules, wherein each predicted nucleic acid molecule has a length of 200-1,200 basepairs, inclusive; -
(C) calculating the following; (i) Z≧
1×
108;(ii) 300≧
K≧
30;(iii) the integer closest to (log4(Z)+2)≧
L1≧
the integer closest to log4(Z);(iv) X is the integer closest to D1×
(K−
L1+1);(v) Y is the integer closest to D2×
(K−
L1+1);(vi) 1.5≧
D1≧
1; and(vii) 1>
D2≧
0.5;(D) selecting oligonucleotides each having a length of K nucleotides, inclusive, and each with at least 90% sequence identity to a predicted nucleic acid molecule in (B); (E) identifying all of the L1-mers occurring in each oligonucleotide; and (F) selecting one or more oligonucleotides that have a sum total value of L1-mer counts in the virtual representation of no fewer than Y and no more than X, wherein an L1-mer is a subregion of the oligonucleotide having a length of L1 nucleotides, wherein an L1-mer count is the number of times the sequence represented by one L1-mer occurs in the genome, and wherein the sum total value of L1-mer counts is the sum of every L1-mer count of the oligonucleotide occurring in the virtual representation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification