Combined content indexing and data reduction
First Claim
1. A computer program product stored on a non-transitory computer-readable media comprising:
- logic which concurrently indexes content and reduces data in data storage, including;
logic which finds individual semantic units in at least one file in data storage, each semantic unit including at least one word;
logic which determines whether a found semantic unit is in an index, and if the found semantic unit is not in the index then adds that semantic unit to the index; and
logic which replaces found semantic units with index pointers to corresponding semantic units in the index.
9 Assignments
0 Petitions
Accused Products
Abstract
Data storage is improved by combining content indexing and data reduction in text-containing files by using common word elimination. Raw data is processed by finding words in selected files, creating an index of found words, and replacing the words in the raw data with pointers to the corresponding words in the index. Each word appears only once in the index. Consequently, the index is relatively small and the procedure is completely reversible. In particular, the index is small relative to other methods because the data is transformed in place, and the transformed data and index are used together to capture the total information about the data.
19 Citations
30 Claims
-
1. A computer program product stored on a non-transitory computer-readable media comprising:
logic which concurrently indexes content and reduces data in data storage, including; logic which finds individual semantic units in at least one file in data storage, each semantic unit including at least one word; logic which determines whether a found semantic unit is in an index, and if the found semantic unit is not in the index then adds that semantic unit to the index; and logic which replaces found semantic units with index pointers to corresponding semantic units in the index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. A method for indexing content and reducing data, comprising the steps of:
-
concurrently indexing content and reducing data in data storage by; finding individual semantic units in at least one file in data storage, each semantic unit including at least one word; in response to finding a semantic unit, determining whether the semantic unit is in an index, and if the found semantic unit is not in the index then adding that semantic unit to the index; and replacing found semantic units with index pointers to corresponding semantic units in the index. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. Apparatus for processing and storing data, including indexing content and reducing the data, comprising:
-
storage media that concurrently stores data and an index; and a processor that; finds individual semantic units in at least one file in the storage media, each semantic unit including at least one word; determines whether a found semantic unit is in the index, and if the found semantic unit is not in the index then adds that semantic unit to the index; and replaces found semantic units with index pointers to the corresponding semantic units in the index. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification