Methods and systems for compressing indices
First Claim
Patent Images
1. A method implemented by a data processing system of a single computer or a network of computer processors, the method comprising:
- selecting from an inverted index first and second entries, each of which includesan index identifying a concept,a plurality of document identifiers each identifying a document in which the concept identified by the index is expressed, anda plurality of concept values each representing a strength of the expression of the concept identified by the index in a respective identified document;
determining, by the data processing system, a plurality of new concept values from the plurality of concept values in the first and second entries; and
combining, by the data processing system, the first and second entries into a combined entry, the combined entry includinga plurality of document identifiers each identifying a document in which at least one of the concepts identified by the indices of the first and second entries is expressed, andthe plurality of new concept values determined from the plurality of concept values in the first and second entries.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.
19 Citations
21 Claims
-
1. A method implemented by a data processing system of a single computer or a network of computer processors, the method comprising:
-
selecting from an inverted index first and second entries, each of which includes an index identifying a concept, a plurality of document identifiers each identifying a document in which the concept identified by the index is expressed, and a plurality of concept values each representing a strength of the expression of the concept identified by the index in a respective identified document; determining, by the data processing system, a plurality of new concept values from the plurality of concept values in the first and second entries; and combining, by the data processing system, the first and second entries into a combined entry, the combined entry including a plurality of document identifiers each identifying a document in which at least one of the concepts identified by the indices of the first and second entries is expressed, and the plurality of new concept values determined from the plurality of concept values in the first and second entries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a data processing system formed of a single computer or a network of computer processors; and an inverted index database stored on one or more data storage devices, the inverted index database comprising a first entry and a combined entry, wherein; the first entry comprises an entry index identifying a concept and a pointer to the combined entry; and the combined entry comprises a plurality of document identifiers each identifying a document and a plurality of concept values each associated with a corresponding document identifier, wherein a strength at which the concept identified by the index of the first entry is expressed in a first document differs from the concept value associated with the document identifier that identifies the first document. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a data processing system formed of a single computer or a network of computer processors; and an inverted index database stored on one or more data storage devices, the inverted index database comprising a first entry and a combined entry, wherein; the first entry comprises an index identifying a concept and a pointer to the combined entry; and the combined entry comprises a plurality of document identifiers each identifying a document, wherein the concept identified by the index of the first entry does not appear in at least one of the documents identified by the document identifiers included in the combined entry. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification