Method for finding a reference token sequence in an original token string within a database of token strings using appended non-contiguous substrings
First Claim
1. A method for finding a reference string of tokens in one or more original token strings within a database comprising the steps of:
- creating one or more original tuples for each of the original token strings in the database by;
a. partitioning each original token string into three or more original substrings of contiguous tokens;
b. appending together two or more original substrings of the original token string to form one or more original tuples associated with the original token string, at least one of the original tuples being formed by appending together two or more non-contiguous original substrings of the original token string;
creating a unique original index for each original tuple created from the original token string by using an index algorithm, the original index being associated with the original token string from which the original tuple was created, each original index associated with information that is used to locate the original token string in the database containing the tuple from which the original index was derived and to determine the position of the matched reference sequence in the original token string;
creating one or more reference tuples from the reference string of tokens by;
c. partitioning the reference string of tokens into three or more reference substrings of contiguous tokens;
d. appending together two or more reference substrings to form one or more reference tuples, at least one of the reference tuples being formed by appending together two or more non-contiguous reference substrings;
creating a unique reference index for each reference tuple using the index algorithm;
comparing at least one reference index to at least one original index;
tracking the matches between the reference index and original index;
selecting an original token string in the database based on the number of matches between one or more original indexes and one or more reference indexes.
0 Assignments
0 Petitions
Accused Products
Abstract
This method non sequentially compares a reference sequence of tokens to an original sequence of tokens to determine subsequences of tokens which exactly or similarly match. The method has a novel approach for creating a large number of indexes by partitioning strings of tokens into substrings, appending non contiguous substrings together to form tuples, and creating indexes from the tuples. Indexes are created in this manner for both the original and reference strings. Techniques are also provided to approximately or exactly locate the substrings which where used to create the tuples and indexes from the original sequence of tokens. Original and reference indexes are compared and matches are tracked. Higher numbers of matches result in higher scores (votes) in a table and indicate a stronger similarity between the sequences on the the original and reference strings. Using this method, the degree of similarity can also be determined. The Method is useful when comparing a reference sequence of tokens to a large database of original strings of tokens. It has applications in the biological sciences (human genome mapping or analyzing proteins) and in image, speech, and music recognition.
-
Citations
28 Claims
-
1. A method for finding a reference string of tokens in one or more original token strings within a database comprising the steps of:
-
creating one or more original tuples for each of the original token strings in the database by; a. partitioning each original token string into three or more original substrings of contiguous tokens; b. appending together two or more original substrings of the original token string to form one or more original tuples associated with the original token string, at least one of the original tuples being formed by appending together two or more non-contiguous original substrings of the original token string; creating a unique original index for each original tuple created from the original token string by using an index algorithm, the original index being associated with the original token string from which the original tuple was created, each original index associated with information that is used to locate the original token string in the database containing the tuple from which the original index was derived and to determine the position of the matched reference sequence in the original token string; creating one or more reference tuples from the reference string of tokens by; c. partitioning the reference string of tokens into three or more reference substrings of contiguous tokens; d. appending together two or more reference substrings to form one or more reference tuples, at least one of the reference tuples being formed by appending together two or more non-contiguous reference substrings; creating a unique reference index for each reference tuple using the index algorithm; comparing at least one reference index to at least one original index; tracking the matches between the reference index and original index; selecting an original token string in the database based on the number of matches between one or more original indexes and one or more reference indexes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising the steps of:
-
creating one or more original tuples for each of the original token strings in the database by; a. partitioning each original token string into three or more original substrings of contiguous tokens; b. appending together two or more original substrings of the original token string to form one or more original tuples associated with the original token string, at least one original tuple being formed by appending two or more non-contiguous original substrings of the original token string; creating a unique original index for each original tuple created from the original token string by using an index algorithm, the original index being associated with the original token string from which the original tuple was created; using the original index to point to a cell in a first memory look-up structure and storing in the cell an information record associated with the original string, the information record containing pointing information used to locate the original token string in the database containing the tuple from which the original index was derived and displacement information used to determine the position of the matched reference sequence in the original token string; creating one or more reference tuples from the reference string of tokens by; c. partitioning the reference string of tokens into three or more reference substrings of contiguous tokens; d. appending together two or more reference substrings to form one or more reference tuples, at least one on the reference tuples being formed by appending together two or more non-contiguous reference substrings; creating a unique reference index for each reference tuple using the index algorithm comparing at least one reference index to at least one original index using the memory look-up structure; tracking the matches between the reference index and original index; storing the tracking results in a second memory look-up structure; selecting an original token string in the database based on the number of matches between one or more original indexes and one or more reference indexes. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for recognizing and accessing a reference string of nucleotides in one or more original DNA strings within a database comprising the steps of:
-
creating one or more original tuples for each of the original DNA strings in the database by; a. partitioning each original DNA string into three or more substrings of contiguous nucleotides; b. appending together two or more original DNA substrings of the original DNA string to form one or more original tuples associated with each original DNA string; creating a unique original index for each original tuple created from the original DNA string using an index algorithm, the original index being associated with the original DNA string from which the original tuple was created; creating one or more reference tuples from the reference string of tokens by; c. partitioning the reference string of nucleotides into three or more reference substrings of contiguous nucleotides; d. appending together two or more reference substrings to form one or more reference tuples, at least one of the reference tuples being formed by appending together two or more non-contiguous reference substrings; creating a unique reference index for each reference tuple using the index algorithm; comparing at least one reference index to at least one original index to determine if the indexes match; tracking the matches between the reference index and original index; selecting an original DNA string in the database based on the number of matches between one or more original indexes and one or more reference indexes.
-
-
24. A method for recognizing and accessing a reference string of amino acids in one or more original protein strings within a database comprising the steps of:
-
creating one or more original tuples for each of the original protein strings in the database by; a. partitioning each original protein string into three or more substrings of contiguous amino acids; b. forming one or more original tuples associated with each original protein string by appending together two or more original amino acid substrings of the original string, one or more of the original tuples being formed by appending together at least two non contiguous original amino acid substrings; creating a unique original index for each original tuple created from the original protein string using an index algorithm, the original index being associated with the original protein string from which the original tuple was created; creating one or more reference tuples from the reference string of tokens by; c. partitioning the reference string of amino acids into three or more contiguous reference substrings of amino acids; d. forming two or more reference tuples by appending together two or more reference substrings, one or more of the reference tuples being formed by appending two or more non contiguous reference substrings; creating a unique reference index for each reference tuple using the index algorithm; comparing at least one reference index to at least one original index; tracking the matches between the reference index and original index; selecting an original protein string in the database based on the number of matches between one or more original indexes and one or more reference indexes.
-
-
25. A method for recognizing and accessing a reference string of characters in one or more original character strings within a database comprising the steps of:
-
creating one or more original tuples for each of the original character strings in the database by; a. partitioning each original character string into three or more substrings of contiguous characters; b. forming one or more original tuples associated with each original character string by appending together two or more original character substrings of the original string, one or more of the original tuples being formed by appending together two or more non contiguous original character substrings; creating a unique original index for each original tuple created from the original character string using an index algorithm, the original index being associated with the original character string from which the original tuple was created; creating one or more reference tuples from the reference string of tokens by; c. partitioning the reference string of characters into three or more non contiguous reference substrings of characters; d. forming two or more reference tuples by appending together two or more reference substrings, one or more of the reference tuples being formed by appending two or more non contiguous reference substrings; creating a unique reference index for each reference tuple using the index algorithm; comparing at least one reference index to at least one original index; tracking the matches between the reference index and original index; selecting an original character string in the database based on the number of matches between one or more original indexes and one or more reference indexes.
-
-
26. A method for recognizing and accessing a reference string of phonemes in one or more original phoneme strings within a database comprising the steps of:
-
creating one or more original tuples for each of the original phoneme strings in the database by; a. partitioning each original phoneme string into three or more original substrings of contiguous phonemes; b. forming one or more original tuples associated with each original phoneme string by appending together two or more original substrings of the original string, one or more of the original tuples being formed by appending together at least two non contiguous original substrings; creating a unique original index for each original tuple created from an original phoneme string using an index algorithm, the original index being associated with the original phoneme string from which the original tuple was created; creating one or more reference tuples from the reference string of phonemes by; c. partitioning the reference string of phoneme into three or more contiguous reference substrings of phonemes; forming two or more reference tuples by appending together two or more reference substrings, one or more of the reference tuples being formed by appending two or more non contiguous reference substrings; creating a unique reference index for each reference tuple using the index algorithm; comparing at least one reference index to at least one original index; tracking the matches between the reference index and original index; selecting an original phoneme string in the database based on the number of matches between one or more original indexes and one or more reference indexes.
-
-
27. A method for recognizing and accessing a reference string of notes in one or more original note strings within a database comprising the steps of:
-
creating one or more original tuples for each of the original note strings in the database by; a. partitioning each original note string into three or more original substrings of contiguous notes; b. forming one or more original tuples associated with each original note string by appending together two or more original substrings of the original string, one or more of the original tuples being formed by appending together at least two non contiguous original substrings; creating a unique original index for each original tuple created from the original note string using an index algorithm, the original index being associated with the original note string from which the original tuple was created; creating one or more reference tuples from the reference string of note by; c. partitioning the reference string of notes into three or more contiguous reference substrings of notes; d. forming two or more reference tuples by appending together two or more reference substrings, one or more of the reference tuples being formed by appending two or more non contiguous reference substrings; creating a unique reference index for each reference tuple using the index algorithm; comparing at least one reference index to at least one original index; tracking the matches between the reference index and original index; selecting an original note string in the database based on the number of matches between one or more original indexes and one or more reference indexes.
-
-
28. A computer system for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising:
-
a database having a set of original token strings; a means for creating at least one original tuple for each of the original token strings in the database, the tuple formed by; a. partitioning each original token string into three or more contiguous original substrings of tokens; b. forming one or more original tuple associated with each original string by appending together two or more original substrings of the original string, one or more of the original tuples being formed by appending together at least two non contiguous original substrings; a unique original index for each original tuple created from the original string using an index algorithm, the original index being associated with the original string from which the original tuple was created; a first memory look-up structure with cells, the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created; one or more reference tuples created from the reference string of tokens by; c. partitioning the reference string of tokens into three or more non contiguous reference substrings of tokens; d. forming the reference tuples by appending together at least two reference substrings, one or more of the reference tuples being formed by appending two or more non contiguous reference substrings; unique reference index for each reference tuple created using the index algorithm, the reference index compared to at least one reference index to at least one original index; a second memory look-up structure for tracking matches between the reference index and original index, an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes.
-
Specification