Methods and Systems for Compressing Indices
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.
8 Citations
49 Claims
-
1-29. -29. (canceled)
-
30. A data processing system comprising:
-
memory storing an inverted index comprising a plurality of entries, wherein i) a first of the entries is a combined entry that includes a first index and a plurality of document identifiers each identifying a document, ii) a second of the entries includes a second index that identifies at least one word, concept, or image and a pointer to the first of the entries, and iii) a third of the entries includes a third index that identifies at least one word, concept, or image that differs from the at least one word, concept, or image identified by the second index and a pointer to the first of the entries; and one or more data processors programmed to retrieve document identifiers that identify documents responsive to search queries, including receiving a first search query to which the at least one word, concept, or image identified by the second index is responsive, locating the second of the entries in the inverted index using the first search query, following the pointer in the second of the entries to the first of the entries, retrieving at least some of the plurality of document identifiers in the first of the entries for responding to the first search query, receiving a second search query to which the at least one word, concept, or image identified by the third index is responsive, locating the third of the entries in the inverted index using the second search query, following the pointer in the third of the entries to the first of the entries, retrieving at least some of the plurality of document identifiers in the first of the entries for responding to the second search query. - View Dependent Claims (31, 32, 33, 34, 35)
-
-
36. A data processing system comprising:
-
memory storing an inverted index comprising a plurality of entries, wherein i) a first of the entries is a combined entry that includes a first index and a plurality of document identifiers each identifying a document, and ii) a second of the entries includes a second index that identifies at least one word or concept and a pointer to the first of the entries; and one or more data processors programmed to retrieve document identifiers that identify documents responsive to a search query, the retrieving including receiving a search query to which the at least one word or concept identified by the second index is responsive, locating the second of the entries in the inverted index using the search query, following the pointer in the second of the entries to the first of the entries and retrieving at least some of the plurality of document identifiers therefrom, processing two or more of the documents identified by the retrieved document identifiers to determine a strength of expression of the at least one word or concept in each of the processed documents; and ranking the processed documents based at least in part on the determined strengths of expression. - View Dependent Claims (37, 38, 39, 40)
-
-
41. A data processing system comprising:
-
memory storing an inverted index comprising a plurality of entries, wherein i) a first of the entries includes a first index that identifies a first word or concept and a plurality of document identifiers each identifying a respective document in which the word or concept identified by the first index is expressed, and ii) a second of the entries includes a second index that identifies a second word or a concept and a plurality of document identifiers each identifying a respective document in which the word or concept identified by the second index is expressed; and one or more data processors programmed to perform operations including; determining that a cost of compressing the first of the entries and the second of the entries is acceptable, and in response to determining that the cost is acceptable, creating a combined entry in the inverted index, wherein the combined entry includes an index and at least one of the plurality of document identifiers in the first of the entries that is absent from the second of the entries. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49)
-
Specification