Methods and systems for compressing indices
First Claim
Patent Images
1. A data processing system comprising:
- memory storing an inverted index comprising a plurality of entries, whereini) a first of the entries includesa plurality of document identifiers each identifying a document,ii) a second of the entriesidentifies at least one word, concept, or image and includes a pointer to the first of the entries, andiii) a third of the entriesidentifies at least one word, concept, or image that differs from the at least one word, concept, or image identified by the second of the entries and includesa pointer to the first of the entries,wherein the first entry is a combined entry related to the second of the entries and the third of the entries; and
one or more data processors programmed to retrieve document identifiers that identify documents responsive to search queries, the retrieving includingreceiving a first search query to which the at least one word, concept, or image identified by the second of the entries is responsive,locating the second of the entries in the inverted index using the first search query,following the pointer in the second of the entries to the first of the entries,retrieving at least some of the plurality of document identifiers in the first of the entries for responding to the first search query,receiving a second search query to which the at least one word, concept, or image identified by the third the entries is responsive,locating the third of the entries in the inverted index using the second search query,following the pointer in the third of the entries to the first of the entries, andretrieving at least some of the plurality of document identifiers in the first of the entries for responding to the second search query.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for compressing indices are described. In one aspect, a plurality of items are selected where each item has an entry in an inverted index and each item entry comprises a listing of articles that the item appears in. At least a first item entry and a second item entry are determined for compression and the second item entry is compressed into the first item entry resulting in a compressed first item entry.
14 Citations
20 Claims
-
1. A data processing system comprising:
-
memory storing an inverted index comprising a plurality of entries, wherein i) a first of the entries includes a plurality of document identifiers each identifying a document, ii) a second of the entries identifies at least one word, concept, or image and includes a pointer to the first of the entries, and iii) a third of the entries identifies at least one word, concept, or image that differs from the at least one word, concept, or image identified by the second of the entries and includes a pointer to the first of the entries, wherein the first entry is a combined entry related to the second of the entries and the third of the entries; and one or more data processors programmed to retrieve document identifiers that identify documents responsive to search queries, the retrieving including receiving a first search query to which the at least one word, concept, or image identified by the second of the entries is responsive, locating the second of the entries in the inverted index using the first search query, following the pointer in the second of the entries to the first of the entries, retrieving at least some of the plurality of document identifiers in the first of the entries for responding to the first search query, receiving a second search query to which the at least one word, concept, or image identified by the third the entries is responsive, locating the third of the entries in the inverted index using the second search query, following the pointer in the third of the entries to the first of the entries, and retrieving at least some of the plurality of document identifiers in the first of the entries for responding to the second search query. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A data processing system comprising:
-
memory storing an inverted index comprising a plurality of entries, wherein i) a first of the entries includes a plurality of document identifiers each identifying a document, and ii) a second of the entries identifies at least one word or concept and includes a pointer to the first of the entries, wherein the first of the entries is a combined entry related to the second of the entries and at least one additional of the entries; and one or more data processors programmed to retrieve document identifiers that identify documents responsive to a search query, the retrieving including receiving a search query to which the at least one word or concept identified by the second of the entries is responsive, locating the second of the entries in the inverted index using the search query, following the pointer in the second of the entries to the first of the entries and retrieving at least some of the plurality of document identifiers therefrom, processing two or more of the documents identified by the retrieved document identifiers to determine a strength of expression of the at least one word or concept in each of the processed documents, and ranking the processed documents based at least in part on the determined strengths of expression. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A data processing system comprising:
-
memory storing an inverted index comprising a plurality of entries, wherein i) a first of the entries identifies a first word or concept and includes a plurality of document identifiers each identifying a respective document in which the word or concept identified by the first of the entries is expressed, and ii) a second of the entries identifies a second word or concept and includes a plurality of document identifiers each identifying a respective document in which the word or concept identified by the second of the entries is expressed; and one or more data processors programmed to perform operations including; determining that a cost of compressing the first of the entries and the second of the entries is acceptable, and in response to determining that the cost is acceptable, creating a combined entry in the inverted index, wherein the combined entry includes at least one of the plurality of document identifiers in the first of the entries that is absent from the second of the entries. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification