Method and apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing
First Claim
1. A method for identifying, classifying, or quantifying one or more nucleic acids in a sample comprising a plurality of nucleic acids having different nucleotide sequences, said method comprising:
- (a) probing said sample with one or more recognition means, each recognition means recognizing a different target nucleotide subsequence or a different set of target nucleotide subsequences;
(b) generating one or more output signals from said sample probed by said recognition means, each output signal being produced from a nucleic acid in said sample by recognition of one or more target nucleotide subsequences in said nucleic acid by said recognition means and comprising a representation of (i) the length between occurrences of target nucleotide subsequences in said nucleic acid, and (ii) the identities of said target nucleotide subsequences in said nucleic acid or the identities of said sets of target nucleotide subsequences among which are included the target nucleotide subsequences in said nucleic acid; and
(c) searching a nucleotide sequence database to determine sequences that are predicted to produce or the absence of any sequences that are predicted to produce said one or more output signals produced by said nucleic acid, said database comprising a plurality of known nucleotide sequences of nucleic acids that may be present in the sample, a sequence from said database being predicted to produce said one or more output signals when the sequence from said database has both (i) the same length between occurrences of target nucleotide subsequences as is represented by said one or more output signals, and (ii) the same target nucleotide subsequences as are represented by said one or more output signals, or target nucleotide subsequences that are members of the same sets of target nucleotide subsequences represented by said one or more output signals, whereby said one or more nucleic acids in said sample are identified, classified, or quantified.
4 Assignments
0 Petitions
Accused Products
Abstract
This invention provides methods by which biologically derived DNA sequences in a mixed sample or in an arrayed single sequence clone can be determined and classified without sequencing. The methods make use of information on the presence of carefully chosen target subsequences, typically of length from 4 to 8 base pairs, and preferably the length between target subsequences in a sample DNA sequence together with DNA sequence databases containing lists of sequences likely to be present in the sample to determine a sample sequence. The preferred method uses restriction endonucleases to recognize target subsequences and cut the sample sequence. Then carefully chosen recognition moieties are ligated to the cut fragments, the fragments amplified, and the experimental observation made. Polymerase chain reaction (PCR) is the preferred method of amplification. Another embodiment of the invention uses information on the presence or absence of carefully chosen target subsequences in a single sequence clone together with DNA sequence databases to determine the clone sequence. Computer implemented methods are provided to analyze the experimental results and to determine the sample sequences in question and to carefully choose target subsequences in order that experiments yield a maximum amount of information.
451 Citations
141 Claims
-
1. A method for identifying, classifying, or quantifying one or more nucleic acids in a sample comprising a plurality of nucleic acids having different nucleotide sequences, said method comprising:
-
(a) probing said sample with one or more recognition means, each recognition means recognizing a different target nucleotide subsequence or a different set of target nucleotide subsequences; (b) generating one or more output signals from said sample probed by said recognition means, each output signal being produced from a nucleic acid in said sample by recognition of one or more target nucleotide subsequences in said nucleic acid by said recognition means and comprising a representation of (i) the length between occurrences of target nucleotide subsequences in said nucleic acid, and (ii) the identities of said target nucleotide subsequences in said nucleic acid or the identities of said sets of target nucleotide subsequences among which are included the target nucleotide subsequences in said nucleic acid; and (c) searching a nucleotide sequence database to determine sequences that are predicted to produce or the absence of any sequences that are predicted to produce said one or more output signals produced by said nucleic acid, said database comprising a plurality of known nucleotide sequences of nucleic acids that may be present in the sample, a sequence from said database being predicted to produce said one or more output signals when the sequence from said database has both (i) the same length between occurrences of target nucleotide subsequences as is represented by said one or more output signals, and (ii) the same target nucleotide subsequences as are represented by said one or more output signals, or target nucleotide subsequences that are members of the same sets of target nucleotide subsequences represented by said one or more output signals, whereby said one or more nucleic acids in said sample are identified, classified, or quantified. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 136, 137, 140)
-
-
81. A method for identifying or classifying a nucleic acid in a sample comprising a plurality of nucleic acids having different nucleotide sequences, said method comprising:
-
(a) probing said nucleic acid with a plurality of recognition means, each recognition means recognizing a target nucleotide subsequence or a set of target nucleotide subsequences, in order to produce an output set of signals, each signal of said output set representing whether said target nucleotide subsequence or one of said set of target nucleotide subsequences is present in said nucleic acid; and (b) searching a nucleotide sequence database, said database comprising a plurality of known nucleotide sequences of nucleic acids that may be present in the sample, for sequences predicted to produce said output set of signals, a sequence from said database being predicted to produce an output set of signals when the sequence from said database (i) comprises the same target nucleotide subsequences represented as present, or comprises target nucleotide subsequences that are members of the sets of target nucleotide subsequences represented as present by the output set of signals, and (ii) does not comprise the target nucleotide subsequences not represented as present or that are members of the sets of target nucleotide subsequences not represented as present by the output set of signals, whereby the nucleic acid is identified or classified. - View Dependent Claims (82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 138, 141)
-
-
102. A method for identifying, classifying, or quantifying DNA molecules in a sample of DNA molecules having a plurality of different nucleotide sequences, the method comprising the steps of:
-
(a) digesting said sample with one or more restriction endonucleases, each said restriction endonuclease recognizing a subsequence recognition site and digesting DNA at said recognition site to produce fragments with 5'"'"' overhangs; (b) contacting said fragments with shorter and longer oligodeoxynucleotides, wherein each said shorter oligodeoxynucleotide comprises a first subsequence 5'"'"' to a second subsequence, said first subsequence being hybridizable to a 5'"'"' overhang and said second subsequence being hybridizable to a longer oligodeoxynucleotide; (c) ligating said longer oligodeoxynucleotides to said 5'"'"' overhangs on said DNA fragments to produce ligated DNA fragments and removing said shorter oligodeoxynucleotides from said ligated DNA fragments; (d) extending said ligated DNA fragments by synthesis with a DNA polymerase to produce blunt-ended double stranded DNA fragments; (e) amplifying said blunt-ended double stranded DNA fragments by a method comprising contacting said DNA fragments with a DNA polymerase and primer oligodeoxynucleotides, each said primer oligodeoxynucleotide having a sequence comprising that of one of the longer oligodeoxynucleotides; (f) determining the length of the amplified DNA fragments; and (g) searching a DNA sequence database, said database comprising a plurality of known DNA sequences that may be present in the sample, for sequences predicted to produce one or more of said fragments of determined length, a sequence from said database being predicted to produce a fragment of determined length when the sequence from said database comprises recognition sites of said one or more restriction endonucleases spaced apart by the determined length, whereby DNA molecules in said sample are identified, classified, or quantified. - View Dependent Claims (103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 139)
-
-
121. A method for identifying, classifying, or quantifying DNA molecules in a sample of DNA molecules with a plurality of nucleotide sequences, the method comprising the steps of:
-
(a) digesting said sample with one or more restriction endonucleases, each said restriction endonuclease recognizing a subsequence recognition site and digesting DNA to produce fragments with 3'"'"' overhangs; (b) contacting said fragments with shorter and longer oligodeoxynucleotides, each said longer oligodeoxynucleotide consisting of a first and second contiguous portion, said first portion being a 3'"'"' end subsequence complementary to the overhang produced by one of said restriction endonucleases, each said shorter oligodeoxynucleotide complementary to the 3'"'"' end of said second portion of said longer oligodeoxynucleotide stand; (c) ligating said longer oligodeoxynucleotides to said DNA fragments to produce a ligated fragments and removing said shorter oligodeoxynucleotides from said ligated DNA fragments; (d) extending said ligated DNA fragments by synthesis with a DNA polymerase to form blunt-ended double stranded DNA fragments; (e) amplifying said double stranded DNA fragments by use of a DNA polymerase and primer oligodeoxynucleotides to produce amplified DNA fragments, each said primer oligodeoxynucleotide having a sequence comprising that of a longer oligodeoxynucleotide; (f) determining the length of the amplified DNA fragments; and (g) searching a DNA sequence database, said database comprising a plurality of known DNA sequences that may be present in the sample, for sequences predicted to produce one or more of said fragments of determined length, a sequence from said database being predicted to produce a fragment of determined length when the sequence from said database comprises recognition sites of said one or more restriction endonucleases spaced apart by the determined length, whereby DNA sequences in said sample are identified, classified, or quantified.
-
-
135. The method of claim 135 wherein the mammal is a human.
Specification