Oligoprobe designstation: a computerized method for designing optimal DNA probes
First Claim
1. A programmed computer system for designing optimal oligonucleotide probes for use with a gene sequence data source comprising:
- first input means for introducing user-selected gene sequences into the computer system;
memory means for storing user-selected gene sequences;
means for accessing gene sequence data from said gene sequence data source;
means for performing hybridization strength modeling on gene sequences which is determined by numbers of matching between gene sequences;
means for performing hybridization strength modeling on gene sequences which is determined by melting temperature (Tm);
means for selecting either of said modeling means; and
means for presenting the results of said modeling to present candidate oligonucleotide probes with a candidate oligonucleotide probe hybridization shown as numbers of hybridizations as compared to said gene sequence data from said gene sequence data source.
0 Assignments
0 Petitions
Accused Products
Abstract
There is disclosed herein an invention which relates to the fields of genetic engineering, microbiology, and computer science, that allows a user, whether they be a molecular biologist or a clinical diagnostician, to calculate and design extremely specific oligonucleotide probes for DNA and mRNA hybridization procedures. The probes designed with this invention may be used for medical diagnostic kits, DNA identification, and potentially continuous monitoring of metabolic processes in human beings. The key features design oligonucleotide probes based on the GenBank database of DNA and mRNA sequences and examine candidate probes for specificity or commonality with respect to a user-selected experimental preparation. Two models are available: a Mismatch Model, that employs hashing and continuous seed filtration, and an H-Site Model, that analyzes candidate probes for their binding specificity relative to some known set of mRNA or DNA sequences. The preferred embodiment of this computerized design tool is written in the Borland® C++ language and runs under Microsoft® Windows™ on IBM® compatible personal computers.
-
Citations
97 Claims
-
1. A programmed computer system for designing optimal oligonucleotide probes for use with a gene sequence data source comprising:
-
first input means for introducing user-selected gene sequences into the computer system; memory means for storing user-selected gene sequences; means for accessing gene sequence data from said gene sequence data source; means for performing hybridization strength modeling on gene sequences which is determined by numbers of matching between gene sequences; means for performing hybridization strength modeling on gene sequences which is determined by melting temperature (Tm); means for selecting either of said modeling means; and means for presenting the results of said modeling to present candidate oligonucleotide probes with a candidate oligonucleotide probe hybridization shown as numbers of hybridizations as compared to said gene sequence data from said gene sequence data source. - View Dependent Claims (2, 48, 49, 50)
-
-
3. A programmed computer system for designing optimal oligonucleotide probes for use with a gene sequence data source comprising:
-
first input means for introducing user-selected gene sequences into the computer system; memory means for storing user-selected gene sequences; means for accessing gene sequence data from said gene sequence data source; means for performing hybridization strength modeling on gene sequences which is determined by numbers of matching between gene sequences; means for presenting the results of said modeling to present candidate oligonucleotide probes with a candidate oligonucleotide probes hybridization shown as numbers of hybridizations as compared to gene sequence data from said gene sequence data source; wherein said means for performing hybridization strength modeling on gene sequences which is determined by numbers of matching utilizes said accessing means to introduce a user-selected set of gene sequence data and a user-selected set of target gene sequence data from said gene sequence data source into the computer system and said memory means to store said gene sequence data and said target gene sequence data; wherein said means for performing said hybridization strength modeling on gene sequences which is determined by numbers of matching includes; means for determining a minimum oligonucleotide probe length; means for creating a look-up hash table and linked list in memory for each gene sequence in said gene sequence data and each of said target gene sequences; means for calculating the minimum length of any matching gene subsequence of said gene sequence data and said target gene sequence data; means for comparing each base pair character in each said target sequence stored in a hash table in memory to each base pair character of said gene sequence stored in a hash table in memory; means for finding a matching seed by determining if said comparison results in a matching seed subsequence of length equal to said calculated minimum length; means for comparing base pair characters behind and ahead of said seed to determine if there exists an extended match of a subsequence of base pair characters of length greater than the calculated minimum length, resulting in a current hit sequence; means for calculating whether said current hit sequence is longer than said minimum oligonucleotide probe length, resulting in a current candidate oligonucleotide probe; means for storing said current candidate oligonucleotide probe; wherein said presenting means provides said current candidate oligonucleotide probe to the user. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A programmed computer system for designing optimal oligonucleotide probes for use with a gene sequence data source comprising:
-
first input means for introducing user-selected gene sequences into the computer system; memory means for storing user-selected gene sequences; means for accessing gene sequence data from said gene sequence data source; means for performing hybridization strength modeling on gene sequences which is determined by melting temperature (Tm); means for selecting either of said modeling means; means for presenting the results of said modeling to present candidate oligonucleotide probes; wherein said means for performing hybridization strength modeling utilizes said first input means to introduce a user-selected screening threshold into the computer system and said accessing means to introduce a user-selected set of gene sequence data and a user-selected set of target gene sequence data from said gene sequence data source into the computer system and said memory means to store said gene sequence data, said target gene sequence data and said screening threshold and wherein said means for performing hybridization strength modeling comprises; means for preprocessing said target gene sequence data and said gene sequence data by selecting only those sequences without introns; means for forming a preparation file of gene sequence fragments by cutting said target gene sequences into fixed length target gene subsequences and sorting said subsequences in lexicographical order; means for merge sorting said gene sequences; means for forming multiple lists of screens by forming lists of subsequences of the preparation file of length equal to said screening threshold; means for indexing, sorting and storing said screens in said memory means; means for sequentially comparing said preparation file gene sequences with each of said screens to design candidate oligonucleotide probes; means for calculating the hybridization strengths between a gene sequence and all candidate oligonucleotides probes containing that gene sequence by accounting for Guanine-Cytosine (GC) and Adenine-Thymine (AT) base pair content of the gene sequence and the number of mismatches between said preparation file sequences and a said screen when said comparison results in a match; means for preparing the candidate oligonucleotide probe and hybridization strength for presentation to the user; wherein said presenting means provides the candidate oligonucleotide probe and hybridization strength to the user. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A programmed computer system for designing optimal oligonucleotide probes for use with a gene sequence data source comprising:
-
first input means for introducing user-selected gene sequences into the computer system; memory means for storing user-selected gene sequences; means for accessing gene sequence data from said gene sequence data source; means for performing hybridization strength modeling on gene sequences which is determined by numbers of matching sequences between gene sequences; and means for presenting the results of said modeling to present candidate oligonucleotide probes with a candidate oligonucleotide hybridization shown as numbers of hybridized probe'"'"'s gene sequences from said gene sequence data source and numbers of matched nucleotides.
-
-
47. A programmed computer system for designing optimal oligonucleotide probes for use with a gene sequence data source comprising:
-
first input means for introducing user-selected gene sequences into the computer system; memory means for storing user-selected gene sequences; means for accessing gene sequence data from said gene sequence data source; means for performing hybridization strength modeling on gene sequences which is determined by melting temperature (Tm); and means for presenting the results of said modeling to present candidate oligonucleotide probes with a candidate oligonucleotide probe hybridization shown as numbers of hybridization to gene sequence data from said gene sequence data source.
-
-
51. A programmed computer system for designing candidate oligonucleotide probes for use with a gene sequence data source including:
-
first input means for introducing user-selected gene sequence, design, model and presentation criteria and a user-specified oligonucleotide probe length into the computer system; memory means for storing said gene sequence, design, model and presentation criteria and said oligonucleotide probe length; means for accessing gene sequence data from said gene sequence data source; wherein said accessing means is operative to introduce a user-selected set of gene sequence data and a user-selected set of target gene sequence data from said gene sequence data source into the computer system; wherein said criteria are used for comparison of gene sequence data and target gene sequence data; means for comparing said gene sequences against said target gene sequences employing said criteria; means for calculating candidate oligonucleotide probes of said oligonucleotide probe length that are either common to a pool of user-specified gene sequences or specific to a particular user-specified gene sequence; means for calculating the homology between the candidate oligonucleotide probes and said gene sequence data; means for calculating a candidate oligonucleotide probe'"'"'s hairpin characteristics; means for displaying in multiple dimensions the gene sequences which result from the comparisons and calculations characterized in that said display format exhibits; the starting position of each candidate oligonucleotide probe in one dimension; a candidate oligonucleotide probe'"'"'s specificity to the target gene sequence in a second dimension; and superimposed melting temperatures of gene sequences in contrasting presentations in at least an apparent third dimension; wherein said display further includes a cursor moveable along one dimension of said display that selects a position for an expansion of data representing the homology between the candidate oligonucleotide probes and said gene sequence data; wherein said display means displays in alphanumeric form the homology between the candidate oligonucleotide probes and said gene sequence data; wherein said display provides an expansion of data including presenting hybridizations at various melting temperatures for all candidate oligonucleotide probes; the location of each hybridization; a candidate oligonucleotide probe'"'"'s starting position; and hairpin characteristics of each candidate oligonucleotide probe.
-
-
52. A method for designing candidate oligonucleotide probes by performing hybridization strength modeling on gene sequences which is determined by numbers of matching sequences for use with a gene sequence data source comprising the steps of:
-
introducing user-selected gene sequences into a computer system; accessing gene sequence data from said gene sequence data source; storing said user-selected gene sequence in the memory of the computer system; accessing the gene sequence source to introduce the user-selected set of gene sequence data and a user-selected set of target gene sequence data from said gene sequence data source into the computer system; storing said gene sequence data and said target gene sequence data in the memory of the computer system; determining a minimum oligonucleotide probe length; creating a look-up hash table and linked list in memory for each gene sequence in said gene sequence data and each of said target gene sequences; calculating the minimum length of any matching gene subsequence of said gene sequence data and said target gene sequence data; comparing each base pair character in each said target sequence stored in a hash table in memory to each base pair character of said gene sequence stored in a hash table in memory; determining a matching seed by determining if the said comparison results in a matching gene subsequence of length equal to said calculated minimum length; comparing base pair characters behind and ahead of said seed to determine if there exists an extended match of a subsequence of base pair characters of length greater than the calculated minimum length, resulting in a current hit sequence; calculating whether said current hit sequence is longer than said minimum oligonucleotide probe length, resulting in a current candidate oligonucleotide probe; storing said current candidate oligonucleotide probe in the memory of the computer system; and presenting a representation of said current candidate oligonucleotide probe to the user. - View Dependent Claims (53, 54, 55, 56, 57, 58, 59, 60, 61, 62)
-
-
63. A method for designing candidate oligonucleotide probes by performing hybridization strength modeling which is determined by melting temperature (Tm) for use with a gene sequence data source comprising the steps of:
-
introducing user-selected gene sequence and a user-selected screening threshold into a computer system; storing user-selected gene sequence and said screening threshold in the memory of the computer system; accessing the gene sequence source to introduce the user-selected set of gene sequence data and a user-selected set of target gene sequence data from said gene sequence data source into the computer system; storing said gene sequence data and said target gene sequence data in the memory of the computer system; preprocessing said target gene sequence data and said gene sequence data by selecting only those sequences without introns; forming a preparation file of gene sequence fragments by cutting said target gene sequences into fixed length target gene subsequences and sorting said subsequences in lexicographical order; merge sorting said gene sequences; forming multiple lists of screens by forming lists of subsequences of the preparation file of length equal to said screening threshold; indexing and sorting said screens in memory;
storing said screens in the memory of the computer system;sequentially comparing said preparation file gene sequences with each of said screens to design candidate oligonucleotide probes; calculating the hybridization strengths between a gene sequence and all candidate oligonucleotide probes containing that gene sequence by accounting for Guanine-Cytosine (GC) and Adenine-Thymine (AT) base pair content of the gene sequence and the number of mismatches between said preparation file sequences and a said screen when said comparison results in a match; preparing the candidate oligonucleotide probe and hybridization strength for presentation to the user; and presenting the candidate oligonucleotide probe and hybridization strength to the user. - View Dependent Claims (64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
-
-
84. A method for designing candidate oligonucleotide probes for use with a gene sequence data source comprising the steps of:
-
introducing user-selected gene sequence and a user-specified oligonucleotide probe length into a computer system; storing said gene sequence and said oligonucleotide probe length in the memory of the computer system; accessing gene sequence data from said gene sequence data source; accessing the gene sequence source to introduce the user-selected set of gene sequence data and a user-selected set of target gene sequence data from said gene sequence data source into the computer system; comparing said gene sequences against said target gene sequences employing said criteria; calculating candidate oligonucleotide probes of said probe length that are either common to a pool of user-specified gene sequences or specific to a particular user-specified gene sequence; calculating the homology between the candidate oligonucleotide probes and said gene sequence data; displaying in multiple dimensions the gene sequences which result from the comparisons and calculations characterized in that said display format exhibits; the starting position of each candidate oligonucleotide probe in one dimension; a candidate oligonucleotide probe'"'"'s specificity to the target gene sequence in a second dimension; and superimposed melting temperatures of gene sequences in contrasting presentations in at least an apparent third dimension. - View Dependent Claims (85, 86, 87, 88, 89, 90, 91, 92, 93, 94)
-
-
95. A method of creating a preparation file from a user-selected set of target gene sequence data comprising:
-
locating the origin of subsequences in a set position of said target gene sequence in a preparation file; cutting said target gene sequence data into fixed-length subsequences; said subsequences beginning every preselected number of positions of said target gene sequence in said preparation file; sorting said subsequences in said preparation file in lexicographical order beginning at a set position; and storing said subsequences in a preparation file.
-
-
96. A method of creating a preparation file from a user-selected set of target gene sequence data comprising:
-
locating the origin of a subsequence in a set position of said target gene sequence in said preparation file wherein the origin of said subsequence is located at position 40 of said target sequence in said preparation file; cutting a subsequence of said target gene sequence data into fixed-length subsequence in the order of 96 base pairs in of length; cutting successive sequences that are a fixed length long every preselected number of positions of said target gene sequence in said preparation file; and sorting said subsequences in said preparation file in lexicographical order beginning at a set position; and storing said subsequences in a preparation file. - View Dependent Claims (97)
-
Specification