AB INITIO GENERATION OF SINGLE COPY GENOMIC PROBES

US 20120253689A1
Filed: 05/11/2012
Published: 10/04/2012
Est. Priority Date: 06/07/2005
Status: Active Grant

First Claim

Patent Images

1. A method of producing a hybridization probe of a target reference complete genome sequence, wherein a single copy sequence is identified by a method of successive division of the target reference genome sequence into subintervals and comparison of the subintervals to the target reference sequence, said method comprising:

determining a count of the number of times a subsequence of a first screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence obtained by division of the target reference genome sequence, whereinthe target reference genome sequence comprises the first screened sequence,the first screened sequence comprises at least two subsequences, anda single copy interval of the first screened sequence is identified as (i) a subsequence of the first screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;

(ii) at least about 70% homology to the target reference sequence;

or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the first screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;

(ii) at least about 70% homology to the target reference sequence;

or (iii) at least about 80% homology to the target reference sequence;

determining a count of the number of times a subsequence of a second screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence, whereinthe second screened sequence comprises a single copy interval of the first screened sequence;

the second screened sequence overlaps the single copy interval of the first screened sequence;

the subsequences of the second screened sequence are either (i) consecutive non-overlapping subintervals of the second screened sequence or (ii) overlapping non-identical subintervals of the second screened sequence; and

(4) a single copy interval of the second screened sequence is identified as (i) a subsequence of the second screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;

(ii) at least about 70% homology to the target reference sequence;

or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the second screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;

(ii) at least about 70% homology to the target reference sequence;

or (iii) at least about 80% homology to the target reference sequence; and

(C) identifying a single copy interval and at least one contiguous divergent repetitive interval of the target reference sequence wherein at least one subsequence in the target sequence contains a divergent repetitive element suitable for use as a probe that hybridizes to a single location in the target genome, wherein said divergent repetitive element is washed under conditions that eliminate cross-hybridization to other target sequences in the genome.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Single copy sequences suitable for use as DNA probes can be defined by computational analysis of genomic sequences. The present invention provides an ab initio method for identification of single copy sequences for use as probes which obviates the need to compare genomic sequences with existing catalogs of repetitive sequences. By dividing a target reference sequence into a series of shorter contiguous sequence windows and comparing these sequences with the reference genome sequence, one can identify single copy sequences in a genome. Probes can then be designed and produced from these single copy intervals.

146 Citations

18 Claims

1. A method of producing a hybridization probe of a target reference complete genome sequence, wherein a single copy sequence is identified by a method of successive division of the target reference genome sequence into subintervals and comparison of the subintervals to the target reference sequence, said method comprising:
- determining a count of the number of times a subsequence of a first screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence obtained by division of the target reference genome sequence, whereinthe target reference genome sequence comprises the first screened sequence,the first screened sequence comprises at least two subsequences, anda single copy interval of the first screened sequence is identified as (i) a subsequence of the first screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the first screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence;
  
  determining a count of the number of times a subsequence of a second screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence, whereinthe second screened sequence comprises a single copy interval of the first screened sequence;
  
  the second screened sequence overlaps the single copy interval of the first screened sequence;
  
  the subsequences of the second screened sequence are either (i) consecutive non-overlapping subintervals of the second screened sequence or (ii) overlapping non-identical subintervals of the second screened sequence; and
  
  (4) a single copy interval of the second screened sequence is identified as (i) a subsequence of the second screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the second screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence; and
  
  (C) identifying a single copy interval and at least one contiguous divergent repetitive interval of the target reference sequence wherein at least one subsequence in the target sequence contains a divergent repetitive element suitable for use as a probe that hybridizes to a single location in the target genome, wherein said divergent repetitive element is washed under conditions that eliminate cross-hybridization to other target sequences in the genome.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein the method further comprises a step of determining a count of the number of times a subsequence of a third screened sequence occurs in the target reference sequence, wherein (A) the third screened sequence comprises a single copy interval of the second screened sequence;
    - (B) the third screened sequence overlaps the single copy interval of the second screened sequence;
      
      (C) the subsequences of the third screened sequence are either (i) consecutive non-overlapping subintervals or (ii) overlapping non-identical subintervals; and
      
      (D) a single copy interval of the third screened sequence is identified as (i) a subsequence of the third screened sequence with a single subsequence occurrence in the target reference sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
      
      (ii) at least about 70% homology to the target reference sequence;
      
      or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the third screened sequence, each member being a single subsequence occurrence in the target reference sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
      
      (ii) at least about 70% homology to the target reference sequence;
      
      or (iii) at least about 80% homology to the target reference sequence.
  - 3. The method of claim 1, wherein the method further comprises a step of determining a count of the number of times a subsequence of a fourth screened sequence occurs in the target reference sequence, wherein (A) the fourth screened sequence comprises a single copy interval of the third screened sequence;
    - (B) the fourth screened sequence overlaps the single copy interval of the third screened sequence;
      
      (C) the subsequences of the fourth screened sequence are either (i) consecutive non-overlapping subintervals or (ii) overlapping non-identical subintervals; and
      
      (D) a single copy interval of the fourth screened sequence is identified as (i) a subsequence of the fourth screened sequence with a single subsequence occurrence in the target reference sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
      
      (ii) at least about 70% homology to the target reference sequence;
      
      or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the fourth screened sequence, each member being a single subsequence occurrence in the target reference sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
      
      (ii) at least about 70% homology to the target reference sequence;
      
      or (iii) at least about 80% homology to the target reference sequence.
  - 4. The method of claim 1, wherein said method further comprises a step of identifying a subsequence of the first or second screened sequences with at least two occurrences in the target reference sequence as a subsequence containing a repetitive element wherein the single copy interval is located adjacent to the repetitive element.
  - 5. The method of claim 4, wherein said method further comprises a step of identifying a second, distinct subsequence of the first or second screened sequences with at least two occurrences in the target reference sequence as a subsequence containing a different repetitive element, wherein the single copy interval is located between the first and the second subsequences containing the differing repetitive elements.
  - 6. The method of claim 3, wherein the second, third, or fourth screened sequence comprises (i) a centromeric end that overlaps the single copy interval of the first, second, or third screened sequence, respectively;
    - (ii) a telomeric end that overlaps the single copy interval of the first, second, or third screened sequence, respectively;
      
      or (iii) a centromeric and telomeric end that both overlap the single copy interval of the first, second, or third screened sequence, respectively.
  - 7. The method of claim 6, wherein said method further comprises a step of determining whether an extended test sequence extends in the direction toward the centromere of the chromosomal arm containing the subsequence.
  - 8. The method of claim 3, wherein a subsequence is (i) at least about 100 consecutive non-overlapping nucleotides;
    - (ii) at least about 200 consecutive non-overlapping nucleotides;
      
      (iii) at least about 400 consecutive non-overlapping nucleotides;
      
      (iv) at least about 600 consecutive non-overlapping nucleotides;
      
      (v) at least about 800 consecutive non-overlapping nucleotides;
      
      or (vi) at least about 1000 consecutive non-overlapping nucleotides.
  - 9. The method of claim 1, wherein the target reference sequence is about 100,000 nucleotides to about 400,000 nucleotides.
  - 10. The method of claim 1, wherein the target reference sequence is a sequenced genome of an organism.
  - 11. The method of claim 10, wherein the target reference sequence is a sequenced genome of a human.
  - 12. The method of claim 1, wherein the overlapping subintervals of the screened sequence are displaced by at least about 20 nucleotides from adjacent subintervals.
  - 13. The method of claim 1, wherein the probe comprises at least two contiguous subsequences of a screened sequence, each of the contiguous subsequences having a single occurrence in the target reference complete genome.
  - 14. The method of claim 1, wherein the sequence of said divergent repetitive element exhibits less than or equal to 70% identity with the sequence of another member of the same repetitive sequence family.
  - 15. The method of claim 1 where the post-hybridization wash is a solution of 0.1×
    - SSC at a temperature exceeding 42 degrees Celsius and the solution comprises 15 mM NaCl and 1.5 mM sodium citrate (Na₃C₆H₅O₇)
  - 16. The method of claim 1 where the post-hybridization wash is a solution of 0.2×
    - SSC at a temperature exceeding 37 degrees Celsius and the solution comprises 30 mM NaCl and 3 mM sodium citrate (Na₃C₆H₅O7).

17. A method of producing a hybridization probe of a target reference complete genome sequence, wherein a single copy sequence is identified by a method of successive division of the target reference genome sequence into subintervals and comparison of the subintervals to the target reference sequence, said method comprising:
- determining a count of the number of times a subsequence of a first screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence obtained by division of the target reference genome sequence, whereinthe target reference genome sequence comprises the first screened sequence,the first screened sequence comprises at least two subsequences, anda single copy interval of the first screened sequence is identified as (i) a subsequence of the first screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the first screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence;
  
  determining a count of the number of times a subsequence of a second screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence, whereinthe second screened sequence comprises a single copy interval of the first screened sequence;
  
  the second screened sequence overlaps the single copy interval of the first screened sequence;
  
  the subsequences of the second screened sequence are either (i) consecutive non-overlapping subintervals of the second screened sequence or (ii) overlapping non-identical subintervals of the second screened sequence; and
  
  (4) a single copy interval of the second screened sequence is identified as (i) a subsequence of the second screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the second screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence; and
  
  (C) identifying a single copy interval and at least one contiguous divergent repetitive interval of the target reference sequence wherein at least one subsequence in the target sequence contains a divergent repetitive element suitable for use as a probe that hybridizes to a single location in the target genome, wherein said divergent repetitive element is washed under conditions that eliminate cross-hybridization to other target sequences in the genome, where such conditions comprise washing the hybridized probe in a solution of 0.1×
  
  SSC (15 mM NaCl and 1.5 mM Na₃C₆H₅O₇) at a temperature exceeding 42 degrees Celsius.

18. A method of producing a hybridization probe of a target reference complete genome sequence, wherein a single copy sequence is identified by a method of successive division of the target reference genome sequence into subintervals and comparison of the subintervals to the target reference sequence, said method comprising:
- determining a count of the number of times a subsequence of a first screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence obtained by division of the target reference genome sequence, whereinthe target reference genome sequence comprises the first screened sequence,the first screened sequence comprises at least two subsequences, anda single copy interval of the first screened sequence is identified as (i) a subsequence of the first screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the first screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence;
  
  determining a count of the number of times a subsequence of a second screened sequence occurs in the target reference genome sequence, said screened sequence being at least one subinterval of the target reference genome sequence, whereinthe second screened sequence comprises a single copy interval of the first screened sequence;
  
  the second screened sequence overlaps the single copy interval of the first screened sequence;
  
  the subsequences of the second screened sequence are either (i) consecutive non-overlapping subintervals of the second screened sequence or (ii) overlapping non-identical subintervals of the second screened sequence; and
  
  (4) a single copy interval of the second screened sequence is identified as (i) a subsequence of the second screened sequence with a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the subsequence having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence, or (ii) a group of contiguous subsequences of the second screened sequence, each member being a single subsequence occurrence in the target reference genome sequence, wherein an occurrence is defined by at least about 50 consecutive nucleotides of the group of contiguous subsequences having (i) at least about 60% homology to the target reference sequence;
  
  (ii) at least about 70% homology to the target reference sequence;
  
  or (iii) at least about 80% homology to the target reference sequence; and
  
  (C) identifying a single copy interval and at least one contiguous divergent repetitive interval of the target reference sequence wherein at least one subsequence in the target sequence contains a divergent repetitive element suitable for use as a probe that hybridizes to a single location in the target genome, wherein said divergent repetitive element is washed under conditions that eliminate cross-hybridization to other target sequences in the genome, where such conditions comprise washing the hybridized probe in a solution of 0.2×
  
  SSC (30 mM NaCl and 3 mM Na₃C₆H₅O₇) at a temperature exceeding 37 degrees Celsius.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Peter K. Rogan
Original Assignee
Peter K. Rogan
Inventors
Rogan, Peter K.

Granted Patent

US 8,407,013 B2
Time in Patent Office

Days
Field of Search
US Class Current

702/20
CPC Class Codes

G16B 25/00   ICT specially adapted for h...

G16B 25/20   Polymerase chain reaction [...

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

AB INITIO GENERATION OF SINGLE COPY GENOMIC PROBES

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

146 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

AB INITIO GENERATION OF SINGLE COPY GENOMIC PROBES

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

146 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links