INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL
First Claim
1. A computer-implemented method for reducing an amount of ranking data analyzed at query time, the method comprising computer-implemented operations for:
- at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold;
selecting a set of documents that includes the term based on the master index;
determining a rank for each document in the set of documents that contains the term;
assigning each document in the set of documents that contains the term to a top document list or a bottom document list based on the rank; and
storing predefined values of at least part of the rank in the top document list for documents in the top document list and not storing the predefined values of at least part of the rank in the bottom document list for documents in the bottom document list.
2 Assignments
0 Petitions
Accused Products
Abstract
Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.
-
Citations
20 Claims
-
1. A computer-implemented method for reducing an amount of ranking data analyzed at query time, the method comprising computer-implemented operations for:
-
at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold; selecting a set of documents that includes the term based on the master index; determining a rank for each document in the set of documents that contains the term; assigning each document in the set of documents that contains the term to a top document list or a bottom document list based on the rank; and storing predefined values of at least part of the rank in the top document list for documents in the top document list and not storing the predefined values of at least part of the rank in the bottom document list for documents in the bottom document list. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method for reducing an amount of ranking data analyzed at query time, the method comprising computer-implemented operations for:
-
at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold; selecting a set of documents that includes the term based on the master index; determining a rank for each document in the set of documents that contains the term; assigning each document in the set of documents that contains the term to a top document list for the term or a bottom document list for the term based on the rank; storing predefined values of at least part of the rank in the top document list for documents in the top document list and not storing the predefined values of at least part of the rank in the bottom document list for documents in the bottom document list; at query time, determining whether each query term in a query is common; upon determining that each query term is common, populating a document result set with at least a subset of documents from the top document list and the bottom document list that satisfy the query, each of the subset of documents occurring in the top document list for at least one term in the query; upon populating the document result set, retrieving the predefined values from the top document list; ranking the document result set according to a linear model based on the pre-computed values for documents from the top document list and zero values for documents from the bottom document list; and transmitting a reduced subset of the document result set having the highest linear rank to a ranking function adapted to re-rank the documents in the reduced subset. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable storage medium having stored thereon a data structure representing a key that includes a list of documents corresponding to a query term, the key being adapted to be accessed by a search engine in response to a query, the key comprising:
-
a document identifier mask containing a first set of bits defining a range of documents in the list; and a bitmap containing a second set of bits indicating which of the documents from the range contain the query term, the documents corresponding to document identifiers, the document identifiers being a function of the document identifier mask and the bitmap. - View Dependent Claims (18, 19, 20)
-
Specification