COMBINED CONTENT INDEXING AND DATA REDUCTION
First Claim
1. A computer program product stored on computer-readable media which, when executed, is operable to index content and reduce data, comprising:
- logic operable to find individual semantic units in at least one file in data storage;
logic operable to determine whether a found semantic unit is in an index, and if the found semantic unit is not in the index then to add that semantic unit to the index; and
logic operable to replace found semantic units with pointers to corresponding semantic units in the index.
9 Assignments
0 Petitions
Accused Products
Abstract
Data storage is improved by combining content indexing and data reduction in text-containing files by using common word elimination. Raw data is processed by finding words in selected files, creating an index of found words, and replacing the words in the raw data with pointers to the corresponding words in the index. Each word appears only once in the index. Consequently, the index is relatively small and the procedure is completely reversible. In particular, the index is small relative to other methods because the data is transformed in place, and the transformed data and index are used together to capture the total information about the data.
-
Citations
30 Claims
-
1. A computer program product stored on computer-readable media which, when executed, is operable to index content and reduce data, comprising:
-
logic operable to find individual semantic units in at least one file in data storage;
logic operable to determine whether a found semantic unit is in an index, and if the found semantic unit is not in the index then to add that semantic unit to the index; and
logic operable to replace found semantic units with pointers to corresponding semantic units in the index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for indexing content and reducing data, comprising the steps of:
-
finding individual semantic units in at least one file in data storage;
in response to finding a semantic unit, determining whether the semantic unit is in an index, and if the found semantic unit is not in the index then adding that semantic unit to the index; and
replacing found semantic units with pointers to corresponding semantic units in the index. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. Apparatus for processing and storing data, including indexing content and reducing the data, comprising:
-
storage media operable to store data and an index; and
a processor operable to;
find individual semantic units in at least one file in the storage media;
determine whether a found semantic unit is in the index, and if the found semantic unit is not in the index then to add that semantic unit to the index; and
replace found semantic units with pointers to the corresponding semantic units in the index. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification