Systems and methods for indexing each level of the inner structure of a string over a language having a vocabulary and a grammar
First Claim
1. A computer-executable method for indexing at least one string containing a language having a vocabulary and a grammar for subsequent retrieval comprising:
- receiving at least one input string containing a language having a vocabulary and a grammar;
dividing each string into a plurality of component parts according to the grammar, wherein each component part includes at least one word and at least one grammatical relationship of the at least one word within the language of the input string;
determining a bit index for each component part;
unioning, for each at least one set of component parts, the bit indices of the component parts within that set of component parts into a bit index for that set of component parts; and
recording the bit index for the at least one set of component parts in a memory for subsequent use in document retrieval.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for indexing and searching the inner structure of a string over a language having a vocabulary and a grammar using bit vectors. The index preserves the inner gramatical structure of the string while allowing for a fast search. A single search provides immediate access to every level of a document, without having to re-search a single string to determine which sub-parts of that string match the search string. When a string is indexed, the index maintains a compositional representation and the grammatical relationship between the elements of the vocabulary according to the language. The string is then indexed on different levels by disregarding some of the grammatical relationships of component levels.
-
Citations
34 Claims
-
1. A computer-executable method for indexing at least one string containing a language having a vocabulary and a grammar for subsequent retrieval comprising:
-
receiving at least one input string containing a language having a vocabulary and a grammar; dividing each string into a plurality of component parts according to the grammar, wherein each component part includes at least one word and at least one grammatical relationship of the at least one word within the language of the input string; determining a bit index for each component part; unioning, for each at least one set of component parts, the bit indices of the component parts within that set of component parts into a bit index for that set of component parts; and recording the bit index for the at least one set of component parts in a memory for subsequent use in document retrieval. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 32)
-
-
18. A computer-executed string indexing system that indexes at least one string containing a language having a vocabulary and a grammar for subsequent retrieval, comprising:
-
a dividing circuit, routine, or application that inputs the at least one string, divides the string according to the grammar and outputs at least one component part, each component part including at least one word and at least one grammatical relationship of the at least one word within the language of the input string; an element index assigning circuit, routine, or application that assigns an index value to each component part; an index converting circuit, routine, or application that converts each index value into a corresponding bit index; a unioning circuit, routine, or application that inputs two or more bit indices and outputs a bit index representing the union of the input two or more bit indices; and a memory that stores the bit index for subsequent retrieval. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33)
-
-
34. A storage medium storing a set of program instructions executable on a data processing device and usable for indexing at least one string over a language having a vocabulary and a grammar for subsequent retrieval, the set of program instructions comprising:
-
instructions for receiving at least one input string containing a language having a vocabulary and a grammar; instructions for dividing each string into a plurality of component parts according to the grammar, each component part including at least one word and at least one grammatical relationship of the at least one word within the language of the input string; instructions for determining a bit index for each component part; instructions for unioning, for each at least one set of component parts, the bit indices of the component parts within that set of component parts into a bit index for that set of component parts; and instructions for recording the bit index for the set in a memory for subsequent use in document retrieval.
-
Specification