Text storage and retrieval system and method
First Claim
Patent Images
1. A computer-based system for storing text, comprising:
- a lexical parser which divides said text into search terms and all other information;
a word vector table which stores said search terms solely as an inverted structure, wherein each said search term is stored uniquely and each said search term is coupled with a list of its positions within said text; and
a first auxiliary data structure which stores only said other information;
wherein said word vector table and said first auxiliary data structure contain sufficient information substantially to reproduce said text.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved method and system for storing and retrieving information written as text. The method and system store most words of the text solely in an inverted structure, and the remainder of the text'"'"'s information in an auxiliary structure. The structures can be quickly searched for keyword information, provide highly efficient storage, and can be reconstituted into the original text.
294 Citations
20 Claims
-
1. A computer-based system for storing text, comprising:
-
a lexical parser which divides said text into search terms and all other information; a word vector table which stores said search terms solely as an inverted structure, wherein each said search term is stored uniquely and each said search term is coupled with a list of its positions within said text; and a first auxiliary data structure which stores only said other information; wherein said word vector table and said first auxiliary data structure contain sufficient information substantially to reproduce said text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-based method for storing text, comprising the steps of:
-
dividing said text into search terms and all other information; storing said search terms solely as an inverted structure, wherein each said search term is stored uniquely and each said search term is coupled with a list of its positions within said text; and storing only said other information in a first auxiliary data structure. - View Dependent Claims (14, 15)
-
-
16. A computer-based method for compressing a particular string of symbols, comprising the steps of:
-
providing a large number of strings similar to said particular string; assigning a value to nearly all substrings contained within said large number of strings, said value being equal to the frequency of occurrence of said substring in said large number of strings, multiplied by the length of said substring; storing the highest valued substrings in a list; assigning to each substring in said list a unique fixed-length code; and replacing each substring contained both within said particular string and said list with the code corresponding to said substring. - View Dependent Claims (18, 19, 20)
-
-
17. A computer-based method for compressing a particular string of symbols, comprising the steps of:
-
providing a large number of strings similar to said particular string; providing a temporary list of substrings contained within said large number of strings; assigning a value to all substrings in said temporary list, said value being equal to the frequency of occurrence of said substring in said large number of strings, multiplied by the length of said substring; subtracting from each string in said large number of strings each substring contained in said temporary list; assigning a value to all remaining substrings in said large number of strings, said value being equal to the frequency of occurrence of said substring in said large number of strings, multiplied by the length of said substring; replacing all substrings in said temporary list having a value lower than any said remaining substring with the highest valued said remaining substrings; iterating said assigning, subtracting, assigning, and replacing steps until said temporary list is substantially unchanged from the previous iteration; assigning to each substring in said temporary list a unique fixed-length code; and replacing each substring contained both within said particular string and said list with the code corresponding to said substring.
-
Specification