Method of identifying virtual representations of nucleotide sequences

US 8,694,263 B2
Filed: 05/21/2004
Issued: 04/08/2014
Est. Priority Date: 05/23/2003
Status: Active Grant

First Claim

Patent Images

1. A method of identifying an oligonucleotide, the method comprising(A) cleaving a genome of at least Z basepairs in silico with a restriction enzyme to generate a plurality of predicted nucleic acid molecules,(B) generating a virtual representation of said genome by identifying predicted nucleic acid molecules, wherein each predicted nucleic acid molecule has a length of 200-1,200 basepairs, inclusive;

(C) calculating the following;

(i) Z≧

1×

10⁸;

(ii) 300≧

K≧

30;

(iii) the integer closest to (log₄(Z)+2)≧

L₁≧

the integer closest to log₄(Z);

(iv) X is the integer closest to D₁×

(K−

L₁+1);

(v) Y is the integer closest to D₂×

(K−

L₁+1);

(vi) 1.5≧

D₁≧

1; and

(vii) 1>

D₂≧

0.5;

(D) selecting oligonucleotides each having a length of K nucleotides, inclusive, and each with at least 90% sequence identity to a predicted nucleic acid molecule in (B);

(E) identifying all of the L₁-mers occurring in each oligonucleotide; and

(F) selecting one or more oligonucleotides that have a sum total value of L₁-mer counts in the virtual representation of no fewer than Y and no more than X, wherein an L₁-mer is a subregion of the oligonucleotide having a length of L₁nucleotides, wherein an L₁-mer count is the number of times the sequence represented by one L₁-mer occurs in the genome, and wherein the sum total value of L₁-mer counts is the sum of every L₁-mer count of the oligonucleotide occurring in the virtual representation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides oligonucleotide probes that can be used to hybridize to a representation of nucleic acid sequences. Compositions containing the probes such as microarrays are also provided. The invention also provides methods of using these probes and compositions in therapeutic, diagnostic, and research applications. Systems and methods for using a word counting algorithm that can quickly and accurately count the number of times a particular string of characters (i.e., nucleotides) appears in a nucleotide sequence (e.g., a genome) are provided. This algorithm can be used to identify the oligonucleotide probes of the invention. The algorithm uses a transform of a genome and an auxiliary data structure to count the number of times a particular word occurs in the genome.

Citations

12 Claims

1. A method of identifying an oligonucleotide, the method comprising(A) cleaving a genome of at least Z basepairs in silico with a restriction enzyme to generate a plurality of predicted nucleic acid molecules,(B) generating a virtual representation of said genome by identifying predicted nucleic acid molecules, wherein each predicted nucleic acid molecule has a length of 200-1,200 basepairs, inclusive;
- (C) calculating the following;
  
  (i) Z≧
  
  1×
  
  10⁸;
  
  (ii) 300≧
  
  K≧
  
  30;
  
  (iii) the integer closest to (log₄(Z)+2)≧
  
  L₁≧
  
  the integer closest to log₄(Z);
  
  (iv) X is the integer closest to D₁×
  
  (K−
  
  L₁+1);
  
  (v) Y is the integer closest to D₂×
  
  (K−
  
  L₁+1);
  
  (vi) 1.5≧
  
  D₁≧
  
  1; and
  
  (vii) 1>
  
  D₂≧
  
  0.5;
  
  (D) selecting oligonucleotides each having a length of K nucleotides, inclusive, and each with at least 90% sequence identity to a predicted nucleic acid molecule in (B);
  
  (E) identifying all of the L₁-mers occurring in each oligonucleotide; and
  
  (F) selecting one or more oligonucleotides that have a sum total value of L₁-mer counts in the virtual representation of no fewer than Y and no more than X, wherein an L₁-mer is a subregion of the oligonucleotide having a length of L₁nucleotides, wherein an L₁-mer count is the number of times the sequence represented by one L₁-mer occurs in the genome, and wherein the sum total value of L₁-mer counts is the sum of every L₁-mer count of the oligonucleotide occurring in the virtual representation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the oligonucleotide is a nucleic acid probe.
  - 3. The method of claim 1, wherein K is 40 to 70.
  - 4. The method of claim 1, wherein the virtual representation has no more than R % of the complexity of said genome, wherein 70%≧
    - R %≧
      
      1%.
  - 5. The method of claim 4, wherein R % is 1 to 2.5%.
  - 6. The method of claim 1, wherein Z is at least 1×
    - 10⁹.
  - 7. The method of claim 1, wherein the genome is a mammalian genome.
  - 8. The method of claim 1, wherein the genome is a human genome.
  - 9. The method of claim 1, wherein D₁is 1.
  - 10. The method of claim 1, wherein D₂is 0.5.
  - 11. The method of claim 1, wherein L₁is 15, 16, 17, 18, 19, 20, 21, 22, 23 or 24.
  - 12. The method of claim 1, wherein said representation is obtained with two or more different restriction endonucleases.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cold Spring Harbor Laboratory
Original Assignee
Cold Spring Harbor Laboratory
Inventors
Healy, John, Lucito, Robert, Wigler, Michael H
Primary Examiner(s)
Skibinsky, Anna

Application Number

US10/851,779
Publication Number

US 20050032095A1
Time in Patent Office

3,609 Days
Field of Search

None
US Class Current

702/19
CPC Class Codes

C07H 21/04 with deoxyribosyl as saccha...

Method of identifying virtual representations of nucleotide sequences

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Method of identifying virtual representations of nucleotide sequences

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links