Method of identifying pattern matches in parameterized strings and square matrices
First Claim
1. A computer implemented method of finding maximal matches in a data string comprising the steps of:
- creating a suffix tree representing the data string;
for each internal node in said suffix tree, generating a list having an entry for each suffix associated with said node, each entry indicating (i) the position of the start of its associated suffix in said data string and (ii) the left context relative to the position, each node having a pathlength indicating the number of symbols between the root of said suffix tree and said node;
recognizing matches having the length of the pathlength of said node for each pair of entries having different left context;
generating a list of said matches; and
reporting said list of matches.
6 Assignments
0 Petitions
Accused Products
Abstract
Methods are disclosed for finding maximal matches in data strings and for finding matches in parameterized strings, that is, strings containing symbols from more than one alphabet in which the symbols from one of the alphabet are treated as parameters. In general, such maximal matches are found by creating a suffix tree representing the data string, generating lists for each node in the tree indicating the left contexts of all suffixes associated with that node and reporting matches for pairs of suffixes having different left contexts. One method of finding parameterized matches is to substitute a common symbol for the symbols of the alphabet representing the parameters before creating the suffix tree and then discarding matches found for which the actual parameters are not consistent. Another, preferred, method of finding parameterized matches is to substitute integers for the symbols of the alphabet representing the parameters, such symbols being chosen to create a linked list in the data string for each different symbol in such alphabet. Matches found by the suffix tree are then consistent and no matches need be discarded. Other methods of finding matches are disclosed in which suffix trees are used in conjunction with square matrices to analyze the data strings.
-
Citations
8 Claims
-
1. A computer implemented method of finding maximal matches in a data string comprising the steps of:
-
creating a suffix tree representing the data string; for each internal node in said suffix tree, generating a list having an entry for each suffix associated with said node, each entry indicating (i) the position of the start of its associated suffix in said data string and (ii) the left context relative to the position, each node having a pathlength indicating the number of symbols between the root of said suffix tree and said node; recognizing matches having the length of the pathlength of said node for each pair of entries having different left context; generating a list of said matches; and reporting said list of matches. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer implemented method of locating duplications within a text, the method comprising the steps of:
-
dynamically encoding one or more characters of said text in a manner in which like parameters contained within a string are labelled to identify their positional relationships within the string; creating a modified suffix tree from said encoded data; applying said modified suffix tree to identify duplications in said text, said duplications representing identical or approximate matches of said text prior to its encoding, wherein said duplications are of at least a threshold length; and reporting said identified duplications in said text. - View Dependent Claims (7, 8)
-
Specification