Index optimization for ranking using a linear model
First Claim
Patent Images
1. A method for reducing an amount of ranking data analyzed at query time, comprising:
- at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold;
selecting a set of documents that includes the term based on the master index;
determining a linear rank for each document in the set of documents that contains the term, the linear rank comprising a function of a term rank associated with each term in the each document and a static rank associated with the each document;
generating a high ranking index containing a first set of documents in the set of documents that contains the term where the linear rank of the each document in the first set of documents is greater than a rank threshold, the rank threshold being different from the threshold;
generating a low ranking index containing a second set of documents in the set of documents that contains the term where the linear rank of the each document in the second set of documents is less than the rank threshold;
generating a supplementary index containing the static rank of the each document in the set of documents that contains the term; and
storing the term rank corresponding to each term-document pair in the high ranking index and the low ranking index.
2 Assignments
0 Petitions
Accused Products
Abstract
Technologies are described herein for providing a more efficient approach to ranking search results. One method reduces an amount of ranking data analyzed at query time. In the method, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a high ranking index or a low ranking index based on the simple rank.
-
Citations
11 Claims
-
1. A method for reducing an amount of ranking data analyzed at query time, comprising:
-
at index time, selecting a term from a master index, the term corresponding to a number of documents greater than a threshold; selecting a set of documents that includes the term based on the master index; determining a linear rank for each document in the set of documents that contains the term, the linear rank comprising a function of a term rank associated with each term in the each document and a static rank associated with the each document; generating a high ranking index containing a first set of documents in the set of documents that contains the term where the linear rank of the each document in the first set of documents is greater than a rank threshold, the rank threshold being different from the threshold; generating a low ranking index containing a second set of documents in the set of documents that contains the term where the linear rank of the each document in the second set of documents is less than the rank threshold; generating a supplementary index containing the static rank of the each document in the set of documents that contains the term; and storing the term rank corresponding to each term-document pair in the high ranking index and the low ranking index. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for reducing an amount of ranking data analyzed at query time, comprising:
-
(a) at index time, selecting a term from a master index, where a number of documents containing the term is greater than a threshold; (b) selecting a set of documents that includes the term based on the master index; (c) determining a linear rank for each document in the set of documents that contains the term, the linear rank comprising a function of a term rank associated with each term in the each document and a static rank associated with the each document; (d) populating a high ranking index containing a first set of documents in the set of documents that contains the term where the linear rank of the each document in the first set of documents is greater than a rank threshold, the rank threshold being different from the threshold; (e) populating a low ranking index containing a second set of documents in the set of documents that contains the term where the linear rank of the each document in the second set of documents is less than the rank threshold;
(f) repeating operations (b)-(e) during the index time for additional terms from the master index, where the number of documents containing each of the additional terms is greater than the threshold;(g) generating a supplementary index containing the static rank of the each document in the set of documents that contains the term; (h) storing the term rank corresponding to each term-document pair in the high ranking index and the low ranking index; (i) at query time, determining whether each query term in a query is common, wherein the each query term is common if the each query term is contained in either one of the high ranking index or the low ranking index, wherein the each query term is uncommon if the each query term is not contained in both the high ranking index and the low ranking index; (j) upon determining whether the each query term is common, populating a top document list with at least a subset of documents that satisfy the query, wherein the subset of documents comprises only documents in the high ranking index if the query term is common, and wherein the subset of documents comprises all documents from the master index if the query term is uncommon; and (k) upon populating the top document list, transmitting the top document list to a ranking function configured to re-rank the documents in the top document list. - View Dependent Claims (8, 9, 10, 11)
-
Specification