Method and apparatuses for creating a full text index accommodating child words
First Claim
1. A method in a computer system for creating a word list associated with a source text including one or more documents, each document comprising a plurality of granules, each granule defining an indexing unit of text including one or more words, wherein the granule size is set to multiple levels, the method comprising the steps of:
- (a) searching at least a portion of one of the documents for a first word;
(b) creating a parent structure which is associated with the first word and which has a location list;
(c) for each granule size, storing the location of the granule containing the first word in the location list of the parent structure for the first word, such that the parent structure stores the location of the granules containing the first word;
(d) creating one or more child structures which are associated with one or more child words, each child word being related to the first word and the child structure having a location list associated therewith, wherein each child word relates to the first word by comprising additional information about the first word, and wherein the parent structure includes a pointer to the child structure; and
(e) for each granule size, storing the location of the granule containing the first word in the location list of the child structure, such that the child structure stores the location of the granules containing the first word.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer system and method for information indexing and retrieval. The full text index can searchably accommodate linguistic, phonetic, conceptual, contextual and other types of relational and descriptive information. The full text index is created in two phases. In the first phase, a word list symbol table, an alphabetically ordered list and a non-repeating word number stream are constructed from the source text. In the second phase, a word number access array and in-memory full text index are constructed and then index data is merged into the final index.
158 Citations
17 Claims
-
1. A method in a computer system for creating a word list associated with a source text including one or more documents, each document comprising a plurality of granules, each granule defining an indexing unit of text including one or more words, wherein the granule size is set to multiple levels, the method comprising the steps of:
-
(a) searching at least a portion of one of the documents for a first word;
(b) creating a parent structure which is associated with the first word and which has a location list;
(c) for each granule size, storing the location of the granule containing the first word in the location list of the parent structure for the first word, such that the parent structure stores the location of the granules containing the first word;
(d) creating one or more child structures which are associated with one or more child words, each child word being related to the first word and the child structure having a location list associated therewith, wherein each child word relates to the first word by comprising additional information about the first word, and wherein the parent structure includes a pointer to the child structure; and
(e) for each granule size, storing the location of the granule containing the first word in the location list of the child structure, such that the child structure stores the location of the granules containing the first word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
(a) searching at least a portion of one of the documents for a second word;
(b) searching the word list for a parent structure associated with the second word;
(c) if a parent structure associated with the second word is located in step (b) of claim 2 then storing the location of the granule containing the second word in the location list of the parent structure associated with the second word;
(d) searching the word list for a child structure associated with the second word; and
(e) if a child structure associated with the second word is located in step (d) then storing the location of the granule containing the second word in the location list of the child structure associated with the second word.
-
-
3. The method of claim 2, further comprising the following steps if a parent structure associated with the second word is not found in step (c) of claim 2:
-
(a) creating a parent structure associated with the second word and having a location list; and
(b) storing the location of the granule containing the second word in the location list of the parent structure associated with the second word.
-
-
4. The method of claim 2, further comprising the following steps if a child structure associated with the second word is not found in step (d) of claim 2:
-
(a) creating a one or more child structures associated with one or more child words of the second word, each of the child structures having a location list associated therewith; and
(b) storing the location of the granule containing the second word in the location list of each of the child structures.
-
-
5. The method of claim 1, wherein each of the child words is an attribute of the first word.
-
6. The method of claim 5, further comprising the step of selecting the attribute from the group consisting of a sub-word attribute, a linguistic root attribute, a linguistic context attribute, and a phonetic attribute.
-
7. The method of claim 1, further comprising the step of determining whether the child word corresponds to the first word.
-
8. The method of claim 1, further comprising the step of linking the parent structure to one or more of the child structures.
-
9. The method of claim 8, further comprising the step of linking one of the child structures to another child structure.
-
10. The method of claim 8, further comprising the steps of linking the parent structure to an intermediate link and linking the intermediate link to a child structure.
-
11. The method of claim 10, further comprising the step of linking the intermediate link to another intermediate link.
-
12. The method of claim 1, further comprising the step of building an index from the word list.
-
13. A computer readable medium comprising instructions for performing the method recited in claim 1.
-
14. A computer system comprising a processor for receiving and executing the instructions from the computer readable medium recited in claim 13.
-
15. A computer system for creating a word list associated with a source text including one or more documents, each document comprising one or more granules, each granule defining an indexing unit of text including one or more words, wherein the granule size can be varied to include varying amounts of text, the computer system comprising:
-
a means for searching the source text for a first word;
a means for creating a parent structure associated with the first word, wherein the parent structure comprises a location array;
a means for determining one or more child words which are associated with the first word, wherein each child word is an attribute of the first word such that each child word provides additional information about the first word;
a means for creating a child structure associated with one of the child words of the first word, wherein the child structure comprises a location array; and
a means for storing the location of the granule containing the first word in the location arrays of the parent structure and the child structure.
-
-
16. A computer system for creating a word list associated with a source text including one or more documents, each document comprising one or more granules, each granule defining an indexing unit of text including one or more words, wherein the granule size can be varied to include varying amounts of text, the computer system comprising:
-
a parent structure associated with a first word, wherein the first word is located in one of the documents, the parent structure comprising a location array for storing the location of the granule containing the first word; and
a child structure comprising a location array for storing the location of the granule containing the first word, wherein the child structure represents a child word and the child word is an attribute of the first word such that each child word relates to the first word.
-
-
17. A method in a computer system for creating a word list associated with a source text including one or more documents, each document comprising a plurality of granules, each granule defining an indexing unit of text including one or more words, wherein the granule size is set to multiple levels, the method comprising the steps of:
-
(a) searching at least a portion of one the documents for a first word;
(b) creating a parent structure which is associated with the first word and which has a location list;
(c) for each granule size, storing the location of the granule containing the first word in the location list of the parent structure for the first word, such that the parent structure stores the location of the granules containing the first word;
(d) creating one or more child structures which are associated with one or more child words, each child word being related to the first word and the child structure having a location list associated therewith; and
(e) for each granule size, storing the location of the granule containing the first word in the location list of the child structure, such that the child structure stores the location of the granules containing the first word;
wherein each of the child words is an attribute of the first word, and wherein the method further comprises the step of selecting the attribute from the group consisting of a sub-word attribute, a linguistic root attribute, a linguistic context attribute, and a phonetic attribute.
-
Specification